Sage Journals: Discover world-class research

Abstract

This paper presents a comparative performance evaluation of a random subsample classifier ensemble with leading machine learning classifiers on high dimensional datasets. Classification performance of the hybrid random subsample ensemble is compared to those of a comprehensive set of machine learning classification algorithms through both in-house simulations and the results published by others in the literature. Performance comparison is based on prediction accuracies on six datasets from the UCI Machine Learning repository, namely Dexter, Madelon, Isolet, Multiple Features, Internet Ads, and Citeseer, with feature counts of up to 105,000. Simulation results establish the competitive performance aspect of the hybrid random subsample ensemble for high dimensional datasets. Specifically, the study findings indicate that hybrid random subsample ensembles with a subsample rate of 15% and base classifier count of 25 or more can achieve a very competitive performance on high dimensional data sets when compared to leading machine learning classifier algorithms.

Keywords

Random subspace random subsample curse of dimensionality classifier ensemble ensemble selection hybrid classifier ensembles classification performance computational complexity

Get full access to this article

View all access options for this article.