Sage Journals: Discover world-class research

Abstract

With the development of the internet and the arrival of large volumes of data, the analysis of transactional data is becoming important in the field of data mining. Clustering algorithms for transactional trade datasets are becoming a hot topic. Among them, clustering with slope algorithm (CLOPE) is widely used as a result of its superior performance, lower memory use, and better quality than other clustering algorithms. However, the quality of the CLOPE algorithm is related to the sequence in which the data is input; different result will be clustered by different input sequences of the same dataset. This can even result in poor clustering. In order to solve the problem, this paper analyzes the CLOPE algorithm deeply and proves that records with more items ahead will improve the quality of the result greatly in theory. A procedure to preprocess the dataset according to item similarity is proposed. The experiment results show that the algorithm has obviously better quality result when the proposed method is used, and it is 10% faster than the traditional procedure. This algorithm is a valid algorithm that produces high quality results for transaction data sets.

Keywords

Clustering with slope data mining data preprocessing cluster algorithm item similarity

Get full access to this article

View all access options for this article.

References

Ren

and Ren

Y.F.

, Kernel fuzzy C-means clustering for word sense disambiguation in BioMedical texts, Journal of Digital Information Management13(6) (2015), 411–420.

Tsang

I.W.

, Kwok

J.T.

and Li

, Learning the kernel in Mahalanobis one-class support vector machines, The 2006 IEEE International Joint Conference on Neural Network Proceedings, 2006, pp. 1169–1175.

Huang

, A fast clustering algorithm to cluster very large categorical data sets in data mining, Research Issues on Data Mining & Knowledge Discovery (1998), 1–8.

Nguyen

T.H.T.

and Huynh

V.N.

, A k-Means-Like Algorithm for Clustering Categorical Data Using an Information Theoretic-Based Dissimilarity Measure, Foundations of Information and Knowledge Systems, Springer International Publishing, 2016, pp. 115–130.

Bai

, Liang

, Dang

and Cao

, A cluster centers initialization method for clustering categorical data, Expert Systems with Applications39(9) (2012), 8022–8029.

Chen

, Zhang

N.L.

, Liu

, Poon

K.M.

and Wang

, Model-based multidimensional clustering of categorical data, Artificial Intelligence176(1) (2012), 2246–2269.

Iam-On

, Boongeon

, Garrett

and Price

, A link-based cluster ensemble approach for categorical data clustering, Knowledge and Data Engineering, IEEE Transactions on24(3) (2012), 413–425.

, Pang

, Zhou

, Han

and Wang

, A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data, Knowledge-Based Systems30 (2012), 129–135.

Yang

, Guan

and You

, CLOPE: A Fast and Effective Clustering Algorithm for Transactional Data, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 682–687.

10.

Pilley

P.H.

and Sikchi

S.S.

, Review of group prediction model for counter terrorism using CLOPE algorithm, International Journal of Advance Research in Computer Science and Management Studies2(1) (2014), 96–100.

11.

Park

and Hong

, Software fault prediction model using clustering algorithms determining the number of clusters automatically, International Journal of Software Engineering & Its Applications8(7) (2014), 199–204.

12.

Peng

and Di

, Clustering method study on high-dimensional trading data, Journal of Multimedia9(3) (2014), 340–347.

13.

Cao

D.K.

and Do

, Applying data mining in money laundering detection for the vietnamese banking industry, Intelligent Information and Database Systems, Springer Berlin Heidelberg, 2012, pp. 207–216.

14.

Krishna

V.V.

and Sha

S.P.

, Building science community by attracting global talents: The case of singapore biopolis, Science Technology & Society20(3) (2015), 389–413.

15.

Alsaleh

, Nayak

and Xu

, Grouping people in social networks using a weighted multi-constraints clustering method, WCCI 2012 IEEE World Congress on Computational Intelligence, 2012, pp. 1–8.

16.

Ong

K.L.

, Li

, Ng

W.K.

and Lim

E.P.

, SCLOPE: An Algorithm for Clustering Data Streams of Categorical Attributes, Data Warehousing and Knowledge Discovery, Springer Berlin Heidelberg, 2004, pp. 209–218.

17.

Bradley

, Gehrke

, Ramakrishnan

and Srikant

, Scaling mining algorithms to large databases, Communications of the Acm45(8) (2002), 38–43.

18.

Domingos

and Hulten

, Catching Up with the Data: Research Issues in Mining Data Streams, Proceedings of Workshop on Research Issues in Data Mining & Knowledge Discovery, 2001, pp. 1–5.

19.

Aggarwal

C.C.

, Han

, Wang

and Yu

P.S.

, A Framework for Clustering Evolving Data Streams, Proceedings of the 29th International Conference on Very Large Data Bases-Volume 29, 2003, pp. 81–92.

20.

Han

, Pei

and Yin

, Mining frequent patterns without candidate generation, Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data29(2) (2000), 1–12.

21.

Yap

P.H.

and Ong

K.L.

, Sigma-SCLOPE: Clustering Categorical Streams Using Attribute Selection, Knowledge-based Intelligent Information & Engineering Systems, Springer Berlin Heidelberg, 2005, pp. 929–935.

22.

, Le

and Wang

, Improving CLOPE’s profit value and stability with an optimized agglomerative approach, Algorithms8(3) (2015), 380–394.

23.

Y.F.

, Le

J.J.

, Wang

, Zhang

and Liu

L.X.

, MR-CLOPE: A MapReduce based transactional clustering algorithm for DNS query log analysis, Journal of Central South University22(9) (2015), 3485–3494.