Abstract
This study discusses and compares grid proximity measures that use representative data points in grid cells and the average distance between the data points in grid cells. Basic theorems for the grid distance measure are formulated and proved. The grid distance measure is applied to the grid-based clustering problem where the number of clusters is dynamically determined by using a threshold value and by maximizing intra-similarity in a cluster and inter-dissimilarity between the clusters.
In this study, the grid-based clustering problem is illustrated and formulated using a 0-1 integer programming approach. We perform numerical experiments on randomly generated problems and also for a clustering problem concerning microarray data of human fibroblasts in varying serum concentrations, with the latter data having been taken from a prior study. The theorems are applicable to the grid-based clustering of any data set.
Get full access to this article
View all access options for this article.
