Overproduce-and-Select: The Grim Reality
2013 (English) Conference paper, Published paper (Refereed)
Abstract [en]
Overproduce-and-select (OPAS) is a frequently used
paradigm for building ensembles. In static OPAS, a large number
of base classifiers are trained, before a subset of the available
models is selected to be combined into the final ensemble. In
general, the selected classifiers are supposed to be accurate
and diverse for the OPAS strategy to result in highly accurate
ensembles, but exactly how this is enforced in the selection process
is not obvious. Most often, either individual models or ensembles
are evaluated, using some performance metric, on available and
labeled data. Naturally, the underlying assumption is that an
observed advantage for the models (or the resulting ensemble)
will carry over to test data. In the experimental study, a typical
static OPAS scenario, using a pool of artificial neural networks
and a number of very natural and frequently used performance
measures, is evaluated on 22 publicly available data sets. The
discouraging result is that although a fairly large proportion
of the ensembles obtained higher test set accuracies, compared
to using the entire pool as the ensemble, none of the selection
criteria could be used to identify these highly accurate ensembles.
Despite only investigating a specific scenario, we argue that the
settings used are typical for static OPAS, thus making the results
general enough to question the entire paradigm.
Place, publisher, year, edition, pages IEEE , 2013.
Keywords [en]
Ensembles, Neural networks, Overproduce-and-select, Data mining, Machine Learning
National Category
Computer Sciences Computer and Information Sciences
Identifiers URN: urn:nbn:se:hb:diva-7054 ISI: 000335317800008 Local ID: 2320/12920 ISBN: 10.1109/CIEL.2013.6613140 (print) OAI: oai:DiVA.org:hb-7054 DiVA, id: diva2:887761
Conference IEEE Symposium on Computational Intelligence and Ensemble Learning (CIEL), 16-19 April 2013 , Singapore
Note Sponsorship :
Swedish Foundation for
Strategic Research through the project High-Performance Data
Mining for Drug Effect Detection (ref. no. IIS11-0053)
2015-12-222015-12-222020-01-29