Sage Journals: Discover world-class research

Abstract

This study discusses and compares grid proximity measures that use representative data points in grid cells and the average distance between the data points in grid cells. Basic theorems for the grid distance measure are formulated and proved. The grid distance measure is applied to the grid-based clustering problem where the number of clusters is dynamically determined by using a threshold value and by maximizing intra-similarity in a cluster and inter-dissimilarity between the clusters.

In this study, the grid-based clustering problem is illustrated and formulated using a 0-1 integer programming approach. We perform numerical experiments on randomly generated problems and also for a clustering problem concerning microarray data of human fibroblasts in varying serum concentrations, with the latter data having been taken from a prior study. The theorems are applicable to the grid-based clustering of any data set.

Keywords

Grid proximity measure Grid-based clustering Dynamic clustering Cluster validity index

Get full access to this article

View all access options for this article.

References

Amini

, Wah

T.Y.

, Saybani

M.R.

and Yazdi

S.R.A.S.

, A study of density-grid based clustering algorithms on data streams, The 8^Th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) (2011).

and Wunsch II

, Survey of clustering algorithms, IEEE Transactions of Neural Networks 16(3) (2005), 645-678.

Cios

K.J.

, Pedrycz

, Swiniarski

R.W.

and Kurgan

L.A.

, Data Mining: A Knowledge Discovery Approach, Springer, (2007), 272-274.

Han

and Kamber

, Data mining concepts and techniques, Morgan Kaufman, 2^nd edition, (2006), 398-434.

Lawler

, Lenstra

, Rinnoony

K.A.

and Shimoys

, The traveling salesman problem: A guided tour of combinatorial optimization, John Wiley, New York, 1985.

Duta

R.O.

and Hart

R.O.

, Pattern classification and scene analysis, chicheste, John Wiley and Sons, 1973.

Blanken

, Ijbema

, Meek

and Akker

, The generalized grid files: Description and performance aspects, Pattern Recognition 37(3) (1990), 503-514.

Liao

W.-K.

, Liu

and Choudhary

, A grid-based clustering algorithm using adaptive mesh refinement, The 7^th workshop on mining scientific and engineering data sets, 2004.

E.W.M.

and Chow

T.W.S.

, A new shifting grid clustering algorithm, Pattern Recognition 37(3) (2004), 503-514.

10.

Yue

, Wei

, Wang

J.-S.

and Wang

, A general grid-clustering approach, Pattern Recognition Letters 29(9) (2008), 1372-1384.

11.

Zhao

Y.C.

and Song

, GDILC: A grid-based density-isoline clustering algorithm, Proceedings of International Conference on Information-Network 3 (2001), 140-145.

12.

Boudraa

A.-O.

, Dynamic estimation of number of clusters in data sets, Electrics Letters 35(19) (1999), 1606-1608.

13.

Dembele

and Kastner

, Fuzzy C-means method for clustering microarray data, Bioinformatics 19(8) (2003), 973-980.

14.

Halkidi

, Batistakis

and Vazirgiannis

, Clustering algorithms and validity measures, Proceedings of SSDBM Conference, Virginia, USA, July 2001.

15.

Kwon

S.H.

, Cluster validity index for fuzzy clustering, Electronic Letter 34(22) (1998), 2176-2177.

16.

Ray

and Turi

R.H.

, Determination of number of clusters in K-means clustering and application in color image segmentation, Proceedings of the 4^th International Conference on Advances in Pattern Recognition and Digital Techniques (ICAPRDT'99), Calcutta, India, (27-29 Dec 1999).

17.

Shen

, Chang

S.I.

, Lee

E.S.

, Deng

and Brown

S.J.

, Determination of cluster number in clustering microarray data, Applied Mathematics and Computation 169 (2005), 1172-1185.

18.

Dunn

J.C.

, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters, Journal of Cybernetics 3(3) (1973), 32-57.

19.

Pilevar

A.H.

and Sukumar

, GCHL: A grid-clustering algorithm for high-dimensional very large spatial data bases, Pattern Recognition Letters 26(7) (2005), 999-1010.

20.

Pandit

and Gupta

, A comparative study on distance measure approaches for clustering, International Journal of Research in Computer Science 2(1) 92011, 29-31.

21.

Sun

, Wang

and Jiang

, A new validation index for determining the number of clusters in a data set, IJCNN '01, International Joint Conference on Neural Networks (2001), 1852-1857.

22.

Hsu

C.-M.

and Chen

M.-R.

, Subspace clustering of high dimensional spatial data with noises, Lecture Notes in Computer Science 3056 (2004), 31-40.

23.

Schrage

L.E.

, Optimization modeling with LINDO, Duxbury Press, 1997.

24.

Iyer

V.R.

, Eisen

M.B.

, Ross

D.T.

, Schuler

, Moore

, Lee

J.C.F.

, Trent

J.M.

, Staudt

L.M.

, Jr., Hudson

, Bogosk

M.S.

et al., The transcriptional program in the response of human fibroblast to serum, Science 283 (1999), 83-87.

25.

Anand

S.S.

, Bell

D.A.

and Hughes

J.G.

, The role of domain knowledge in data mining, Proc ACM CIKM '95, Baltimore MD USA, (1995), 37-43.

26.

Pohle

, Integrating and updating domain knowledge with data mining, The 6th International Conference for Business Informatics 2003 (WI-2003), Dresden, Germany, (2003), 15-17.

27.

Kopanas

, Avouris

N.M.

and Daskalaki

, The role of domain knowledge in a large scale data mining project, LNAI 2308 (2002), 288-299.

28.

Dembélé

and Kastner

, Fuzzy C-means method for clustering microarray data, Bioinformatics 19(8) (2003), 973-980.