Abstract
In supervised classification if one of the classes has fewer objects than the other, we have a class imbalance problem. One of the most common solutions to address class imbalance problems is oversampling, and SMOTE is the most referenced and well-known oversampling method. However, SMOTE creates synthetic objects in a random way, therefore it produces a different result each time it is applied, and in practice the user has to apply SMOTE several times for choosing the best of all the generated balanced datasets. For this reason, in this paper, we present SMOTE-D, a deterministic version of SMOTE, and propose new deterministic SMOTE-D-based versions of some of the most recent and successful SMOTE-based methods. In our experiments, we show that all proposed deterministic methods produce as good results as random methods but our proposals need to be applied just once. This is very important from a practical point of view since our proposals save time by avoiding multiple applications of them as SMOTE does and they provide one unique result.
Get full access to this article
View all access options for this article.
