Generating Comprehensible QSAR Models
2009 (English)Conference paper, Published paper (Refereed)
Abstract [en]
This paper presents work in progress from the
INFUSIS project and contains initial experimentation, using
publicly available medicinal chemistry datasets, on obtaining
comprehensible QSAR models. Three techniques are evaluated
on both predictive performance, measured as accuracy, and
comprehensibility, measured as model size. The chosen
techniques are J48 decision trees and JRip and Chipper decision
lists. The results show that J48 obtains superior accuracy and
that Chipper performs best of the two decision list algorithms on
accuracy. Furthermore, it is seen that, regarding accuracy, all
techniques benefit from feature reduction, which almost always
results in increased accuracy. Regarding comprehensibility, JRip
obtains the smallest models, followed by Chipper, with J48
producing the largest models. For model size, feature reduction is
not seen to be universally beneficial; only J48 produces smaller
models for the reduced datasets, while both decision list
algorithms actually produce larger models on average. The
overall conclusion is that, for these datasets, there exists a definite
tradeoff between accuracy and comprehensibility that needs to be
investigated further.
Place, publisher, year, edition, pages
University of Skövde , 2009.
Series
Skövde studies in Informatics, ISSN 1653-2325 ; 2009:3
Keywords [en]
concept description, QSAR, classification, Machine Learning
Keywords [sv]
data mining
National Category
Computer and Information Sciences Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hb:diva-6309Local ID: 2320/5911OAI: oai:DiVA.org:hb-6309DiVA, id: diva2:886996
Conference
3rd Skövde Workshop on Information Fusion Topics 2009, Skövde, Sweden
2015-12-222015-12-222018-01-10