Sage Journals: Discover world-class research

Abstract

In supervised classification, a training set is given to a classifier to learn a decision rule for classifying unseen cases. When large training sets are processed, the training stage becomes slow especially for instance-based learning. However, not all information in a training set is useful for classification because it could contain either redundant or noisy prototypes. Therefore a process for discarding useless prototypes is required; this process is known as prototype selection. In this work, we present some methods for selecting prototypes based on prototype relevance, which are accurate and fast for large datasets; in addition, our methods can be applied over datasets described by nominal features. We report experimental results showing the effectiveness of our methods as well as a comparison against other successful prototype selection methods.

Keywords

Prototype selection prototype relevance border prototypes

Get full access to this article

View all access options for this article.

References

Kuncheva

L.I.

and Bezdek

J.C.

, Nearest prototype classification: Clustering. Genetic algorithms. Or random search?IEEE Transactions on Systems. Man and Cybernetics28-1(Part C) (1998), 160–164.

Olvera-López

J.A.

, Carrasco-Ochoa

J.A.

, Martínez-Trinidad

J.F.

Prototype Selection Via Prototype Relevance

Ruiz-Shulcloer

and Kropatsch

W.G.

CIARP LNCS

153–160.

Cover

and Hart

, Nearest neighbor pattern classification, IEEE Transactions on Information Theory13 (1967), 21–27.

Atkeson

C.G.

, Moorel

A.W.

and Schaal

, Locally weighted learning, Artificial Intelligence Review11(1-5) (1997), 11–73.

Vapnik

, The Nature of Statistical Learning Theory, Springer-Verlag

New York

1995.

Quinlan

J.R.

, C4.5: Programs for Machine LearningMorgan Kaufmann1993.

Hart

P.E.

, The condensed nearest neighbor rule, IEEE Transactions on Information Theory14 (1968), 515–516.

Chien-Hsing

, Bo-Han

and Fu

, The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method, in: 18th International Conference on Pattern Recognition (2006),2, 556–559.

Wilson

D.L.

, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man and Cybernetics2 (1972), 408–421.

10.

Devijver

P.A.

and Kittler

, On the edited nearest neighbor rule, In Proceedings of the 5th International Conference on Pattern Recognition (1980), 72–80.

11.

Wilson

D.R.

and Martínez

T.R.

, Reduction techniques for instance-based learning algorithms, Machine Learning38 (2000), 257–286.

12.

Brighton

and Mellish

, Advances in instance selection for instance-based learning algorithms, Data Mining and Knowledge Discovery6(2) (2002), 153–172.

13.

Yuangui

, Zhonhui

, Yunze

, Weidong

, Support Vector Based Prototype Selection Method for Nearest Neighbor Rules, Wang

, ICNIC 2005, LNCS 3610 (2005), 528–535.

14.

Garain

, Prototype reduction using an artificial immune model, Pattern Analysis and Applications11 (2008), 353–363.

15.

García

, Cano

J.R.

and Herera

, A memetic algorithm for evolutionary prototype selection: A scaling up approach, Pattern Recognition41(8) (2008), 2693–2709.

16.

García-Pedrajas

and Romero del Castillo

J.A.

, and Ortíz-Boyer

, A cooperative coevolutionary algorithm for instance selection for instance-based learning, Machine Learning78(3) (2010), 381–420.

17.

Vallejo

C.G.

, Troyano

J.A.

and Ortega

, InstanceRank: Bringing order datasets, Pattern Recognition Letters31 (2010), 133–142.

18.

Bezdek

J.C.

and Kuncheva

L.I.

, Nearest prototype classifier designs: An experimental study, International Journal of Intelligent Systems16(12) (2001), 1445–1473.

19.

Liu

and Motoda

, On issues of instance selection, Data Mining and Knowledge Discovery6 (2002), 115–130.

20.

Spillmann

, Neuhaus

, Bunke

, Pękalska

, Duin

R.P.W.

, Transforming Strings to Vector Spaces Using Prototype Selection, Yeung

D.-Y.

, SSPR&SPR 2006 LNCS 4109 (2006), 287–296.

21.

Caises

, González

, Leyva

, Pérez

SCIS: Combining Instance Selection Methods to Increase Their Effectiveness over a Wide Range of Domains, Corchado

and Yin

, IDEAL 2009 LNCS 5788 (2009), 17–24.

22.

Lumini

and Nanni

, A clustering method for automatic biometric template selection, Pattern Recognition39(3) (2006), 495–497.

23.

Olvera-López

J.A.

, Carrasco-Ochoa

J.A.

and Martínez-Trinidad

J.F.

, A new Fast prototype selection method based on clustering, Pattern Analysis and Applications13(2) (2010), 131–141.

24.

Leyva

, González

and Pérez

, Three new instance selection methods based on local sets: A comparative study with several approaches from a bi-objective perspective, Pattern Recognition48(4) (2015), 1523–1537.

25.

Hamidzadeh

, Monsefi

and Yazdi

H.S.

, IRAHC: Instance reduction algorithm using hyperrectangle clustering, Pattern Recognition48(5) (2015), 1878–1889.

26.

Ougiaroglou

and Evangelidis

, Fast and accurate k-nearest neighbor classification using prototype selection by clustering, In: 16th IEEE Panhellenic Conference on Informatics (2012), pp 168–173.

27.

Hernandez-Leal

, Carrasco-Ochoa

J.A.

, Martínez-Trinidad

J.F.

and Olvera-Lopez

J.A.

, InstanceRank based on borders for instance selection, Pattern Recognition46(1) (2013), 365–375.

28.

De Haro-García

and García-Pedrajas

, A divide-and-conquer approach for scaling up instance selection algorithm, Data Mining and Knowledge Discovery18(3) (2009), 392–418.

29.

De Haro-García

, García-Pedrajas

and del Castillo

J.A.

, Large scale instance selection by means of federal instance selection, Data & Knowledge Engineering75 (2012), 58–77.

30.

Swonger

C.W.

, Sample set condensation for a condensed nearest neighbour decision rule for pattern recognition, Watanabe

, In Frontiers of Pattern Recognition (1972), 511–519.

31.

Bache

and Lichman

, UCI Machine Learning Repository, Irvine CAUniversity of California, School of Information and Computer Science (2013) http://archive.ics.uci.edu/ml.

32.

Wilson

D.R.

and Martínez

T.R.

, Improved heterogeneous distance functions, Journal of Artificial Intelligence Research6(1) (1997), 1–34.

33.

Hall

, Frank

, Holmes

, Pfahringer

, Reutemann

and Witten

I.H.

, The WEKA data mining software: An update, ACM SIGKDD Explorations News Letter11(1) (2009), 10–18.

34.

Demšar

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research7 (2006), 1–30.

35.

Zhai

, Li

and Wnag

, A cross-selection instance algorithm, Journal of Intelligent & Fuzzy Systems30 (2016), 717–728.

36.

Liu

, Wang

, Lv

and Konan

, An efficient instance selection algorithm to reconstruct training sets for support vector machine, Knowledge-Based Systems116 (2017), 58–73.