Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Obtaining accurate and comprehensible classifiers using oracle coaching
University of Borås, School of Business and IT. (CSL@BS)
University of Borås, School of Business and IT. (CSL@BS)
University of Borås, School of Business and IT. (CSL@BS)ORCID iD: 0000-0003-0274-9026
2012 (English)In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. Volume 16, no Number 2, p. 247-263Article in journal (Refereed) Published
Abstract [en]

While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

Place, publisher, year, edition, pages
IOS Press , 2012. Vol. Volume 16, no Number 2, p. 247-263
Keywords [en]
Classification, Comprehensibility, Decision trees, Decision lists, Oracle coaching, Machine learning, Data mining
National Category
Computer Sciences Computer and Information Sciences
Research subject
Bussiness and IT
Identifiers
URN: urn:nbn:se:hb:diva-1346DOI: 10.3233/IDA-2012-0522ISI: 000301366100007Local ID: 2320/11567OAI: oai:DiVA.org:hb-1346DiVA, id: diva2:869370
Available from: 2015-11-13 Created: 2015-11-13 Last updated: 2020-01-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Johansson, UlfSönströd, CeciliaLöfström, Tuwe

Search in DiVA

By author/editor
Johansson, UlfSönströd, CeciliaLöfström, Tuwe
By organisation
School of Business and IT
In the same journal
Intelligent Data Analysis
Computer SciencesComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 953 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf