Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Obtaining Accurate and Comprehensible Data Mining Models: An Evolutionary Approach
University of Borås, School of Business and IT.
2007 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

When performing predictive data mining, the use of ensembles is claimed to virtually guarantee increased accuracy compared to the use of single models. Unfortunately, the problem of how to maximize ensemble accuracy is far from solved. In particular, the relationship between ensemble diversity and accuracy is not completely understood, making it hard to efficiently utilize diversity for ensemble creation. Furthermore, most high-accuracy predictive models are opaque, i.e. it is not possible for a human to follow and understand the logic behind a prediction. For some domains, this is unacceptable, since models need to be comprehensible. To obtain comprehensibility, accuracy is often sacrificed by using simpler but transparent models; a trade-off termed the accuracy vs. comprehensibility trade-off. With this trade-off in mind, several researchers have suggested rule extraction algorithms, where opaque models are transformed into comprehensible models, keeping an acceptable accuracy. In this thesis, two novel algorithms based on Genetic Programming are suggested. The first algorithm (GEMS) is used for ensemble creation, and the second (G-REX) is used for rule extraction from opaque models. The main property of GEMS is the ability to combine smaller ensembles and individual models in an almost arbitrary way. Moreover, GEMS can use base models of any kind and the optimization function is very flexible, easily permitting inclusion of, for instance, diversity measures. In the experimentation, GEMS obtained accuracies higher than both straightforward design choices and published results for Random Forests and AdaBoost. The key quality of G-REX is the inherent ability to explicitly control the accuracy vs. comprehensibility trade-off. Compared to the standard tree inducers C5.0 and CART, and some well-known rule extraction algorithms, rules extracted by G-REX are significantly more accurate and compact. Most importantly, G-REX is thoroughly evaluated and found to meet all relevant evaluation criteria for rule extraction algorithms, thus establishing G-REX as the algorithm to benchmark against.

Place, publisher, year, edition, pages
Linköping University, Department of Computer and Information Science , 2007.
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 1086
Keywords [en]
rule extraction, ensembles, data mining, genetic programming, artificial neural networks
National Category
Computer and Information Sciences Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hb:diva-3415Local ID: 2320/2136ISBN: 978-91-85715-34-3 (print)OAI: oai:DiVA.org:hb-3415DiVA, id: diva2:876804
Note

Avhandling framlagd 2007-06-01 vid Högskolan i Skövde.

Opponent: Rögnvaldsson, Thorsteinn, Professor, Sektionen för informationsvetenskap, Data- och Elektroteknik, Högskolan i Halmstad.

Available from: 2015-12-04 Created: 2015-12-04 Last updated: 2018-01-10

Open Access in DiVA

fulltext(2975 kB)1549 downloads
File information
File name FULLTEXT01.pdfFile size 2975 kBChecksum SHA-512
e7f1c0b0aa058f9fe2849d65342bb1b8eee836b03a34d47d57835078021a2b6e063e592a9eec2fdab64aaa2e328d06c2266c35fcb8cca3682a0894b0590b1259
Type fulltextMimetype application/pdf

Authority records

Johansson, Ulf

Search in DiVA

By author/editor
Johansson, Ulf
By organisation
School of Business and IT
Computer and Information SciencesComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1550 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 416 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf