A major challenge in stream clustering is the evolution in the statistical properties of the underlying data. As clustering is inherently unsupervised, selecting suitable parameter values is often difficult. Clustering algorithms with sensitive parameters are often not robust to such changes, leading to poor clustering outputs. Algorithms using
-NN graphs face this problem, as they have a sensitive
-connectivity parameter which prohibits them from adapting to stream concept evolution. We address this by controlling the excess of the skewness of edge length distributions in the underlying
-NN graph by introducing novel skewness excess concept. We demonstrate the asymptotic linear dependency of skewness excess against the graph connectivity and propose the novel RobustRepStream algorithm, which extends the RepStream algorithm, and provides improved robustness against stream evolution. By automatically controlling the skewness excess, the user no longer needs to specify the
-connectivity parameter, and RobustRepStream can adjust the graph connectivity locally in order to achieve performance close to when the optimal
value is known. We demonstrate that RobustRepStream’s skewness threshold parameter is insensitive and universal across all data sets. We comprehensively evaluate RobustRepStream on real-world benchmark data sets against previous stream clustering algorithms, and demonstrate that it provides better clustering performance.