In this study, the task of obtaining accurate and comprehensible concept descriptions of a specific set of production instances has been investigated. The suggested method, inspired by rule extraction and transductive learning, uses a highly accurate opaque model, called an oracle, to coach construction of transparent decision list models. The decision list algorithms evaluated are JRip and four different variants of Chipper, a technique specifically developed for concept description. Using 40 real-world data sets from the drug discovery domain, the results show that employing an oracle coach to label the production data resulted in significantly more accurate and smaller models for almost all techniques. Furthermore, augmenting normal training data with production data labeled by the oracle also led to significant increases in predictive performance, but with a slight increase in model size. Of the techniques evaluated, normal Chipper optimizing FOIL’s information gain and allowing conjunctive rules was clearly the best. The overall conclusion is that oracle coaching works very well for concept description.
Sponsorship:
This work was supported by the INFUSIS project
(www.his.se/infusis) at the University of Skövde, Sweden, in
partnership with the Swedish Knowledge Foundation under
grant 2008/0502.