Sage Journals: Discover world-class research

Abstract

One of the major tasks in text categorization systems is dimensionality reduction, which strongly affects classification performance and scalability. Among dimensionality reduction methods, feature ranking-based feature selection, also known as best individual features, is scalable, simple, and inexpensive. However, selecting the proper feature ranking method for a given data set is not obvious without conducting experiments on the given data set. The performance varies depending on the data characteristics and the choice of the classifier. In this paper a framework, which is called feature meta-ranking, is introduced to identify the best feature ranking measure among a set of candidate solutions for a particular text classification problem. The feature meta-ranking technique is implemented based on the differential filter level performance method. This method uses a simple classifier, such as Rocchio, to estimate the behavior of the feature ranking measure with respect to a particular data set. With respect to the use of a classifier in the feature selection loop, the proposed method can be considered as a hybrid feature selection technique with minimal use of a classifier in the loop. The proposed method is evaluated by applying it to six data sets. Seven feature ranking measures are employed and evaluated. The stability of the method in terms of insensitivity to the resolution of filter level is demonstrated. The proposed method is also examined with more sophisticated classifiers such as support vector machines, and the results confirm the performance obtained with simple classifiers.

Keywords

Feature selection supervised learning text classification

Get full access to this article

View all access options for this article.