One Tree to Explain Them All
2011 (English)Conference paper, Published paper (Refereed)
Abstract [en]
Random forest is an often used ensemble technique,
renowned for its high predictive performance. Random forests
models are, however, due to their sheer complexity inherently
opaque, making human interpretation and analysis impossible.
This paper presents a method of approximating the random forest
with just one decision tree. The approach uses oracle coaching,
a recently suggested technique where a weaker but transparent
model is generated using combinations of regular training data
and test data initially labeled by a strong classifier, called the
oracle. In this study, the random forest plays the part of the
oracle, while the transparent models are decision trees generated
by either the standard tree inducer J48, or by evolving genetic
programs. Evaluation on 30 data sets from the UCI repository
shows that oracle coaching significantly improves both accuracy
and area under ROC curve, compared to using training data
only. As a matter of fact, resulting single tree models are as
accurate as the random forest, on the specific test instances.
Most importantly, this is not achieved by inducing or evolving
huge trees having perfect fidelity; a large majority of all trees
are instead rather compact and clearly comprehensible. The
experiments also show that the evolution outperformed J48, with
regard to accuracy, but that this came at the expense of slightly
larger trees.
Place, publisher, year, edition, pages
IEEE , 2011.
Keywords [en]
genetic programming, random forest, oracle coaching, decision trees, Machine learning
Keywords [sv]
Data mining
National Category
Computer Sciences Computer and Information Sciences
Research subject
Bussiness and IT
Identifiers
URN: urn:nbn:se:hb:diva-6680Local ID: 2320/9855ISBN: 978-1-4244-7834-7 (print)OAI: oai:DiVA.org:hb-6680DiVA, id: diva2:887380
Conference
IEEE Congress on Evolutionary Computation (CEC)
Note
Sponsorship:
This work was supported by the INFUSIS project www.
his.se/infusis at the University of Skövde, Sweden, in partnership
with the Swedish Knowledge Foundation under grant
2008/0502.
2015-12-222015-12-222020-01-29