Open this publication in new window or tab >>2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]
Online predictive modeling of streaming data is a
key task for big data analytics. In this paper, a novel approach
for efficient online learning of regression trees is proposed,
which continuously updates, rather than retrains, the tree as
more labeled data become available. A conformal predictor
outputs prediction sets instead of point predictions; which for
regression translates into prediction intervals. The key property
of a conformal predictor is that it is always valid, i.e., the error
rate, on novel data, is bounded by a preset significance level. Here,
we suggest applying Mondrian conformal prediction on top of the
resulting models, in order to obtain regression trees where not
only the tree, but also each and every rule, corresponding to
a path from the root node to a leaf, is valid. Using Mondrian
conformal prediction, it becomes possible to analyze and explore
the different rules separately, knowing that their accuracy, in
the long run, will not be below the preset significance level.
An empirical investigation, using 17 publicly available data sets,
confirms that the resulting rules are independently valid, but also
shows that the prediction intervals are smaller, on average, than
when only the global model is required to be valid. All-in-all,
the suggested method provides a data miner or a decision maker
with highly informative predictive models of streaming data.
Place, publisher, year, edition, pages
IEEE, 2014
Keywords
Conformal Prediction, Streaming data, Regression trees, Interpretable models, Machine learning, Data mining
National Category
Computer Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:hb:diva-7324 (URN)10.1109/BigData.2014.7004263 (DOI)2320/14627 (Local ID)978-1-4799-5666-1/14 (ISBN)2320/14627 (Archive number)2320/14627 (OAI)
Conference
IEEE International Conference on Big Data, 27-30 October, 2014, Washington, DC, USA
Note
Sponsorship:
This work was supported by the Swedish Foundation for Strategic
Research through the project High-Performance Data Mining for Drug Effect
Detection (IIS11-0053), the Swedish Retail and Wholesale Development
Council through the project Innovative Business Intelligence Tools (2013:5)
and the Knowledge Foundation through the project Big Data Analytics by
Online Ensemble Learning (20120192).
2015-12-222015-12-222018-01-10