Change search
Link to record
Permanent link

Direct link
Sönströd, Cecilia
Publications (10 of 20) Show all publications
Löfström, T., Linnusson, H., Sönströd, C. & Johansson, U. (2015). System Health Monitoring using Conformal Anomaly Detection. Högskolan i Borås
Open this publication in new window or tab >>System Health Monitoring using Conformal Anomaly Detection
2015 (English)Report (Other (popular science, discussion, etc.))
Place, publisher, year, edition, pages
Högskolan i Borås, 2015. p. 20
Keywords
System health monitoring, conformal anomaly detection
National Category
Computer Sciences
Research subject
Bussiness and IT
Identifiers
urn:nbn:se:hb:diva-9951 (URN)
Projects
Promoting and Enhancing Reuse of Information throughout the Content Lifecycle taking account of Evolving Semantics (PERICLES)
Funder
EU, FP7, Seventh Framework Programme
Note

Technical report

Available from: 2016-05-25 Created: 2016-05-25 Last updated: 2020-01-29Bibliographically approved
Johansson, U., Sönströd, C. & König, R. (2014). Accurate and Interpretable Regression Trees using Oracle Coaching. Paper presented at 5th IEEE Symposium Computational Intelligence and Data Mining, 9-12 Decmber, Orlando, FL, USA. Paper presented at 5th IEEE Symposium Computational Intelligence and Data Mining, 9-12 Decmber, Orlando, FL, USA. IEEE
Open this publication in new window or tab >>Accurate and Interpretable Regression Trees using Oracle Coaching
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In many real-world scenarios, predictive models need to be interpretable, thus ruling out many machine learning techniques known to produce very accurate models, e.g., neural networks, support vector machines and all ensemble schemes. Most often, tree models or rule sets are used instead, typically resulting in significantly lower predictive performance. The over- all purpose of oracle coaching is to reduce this accuracy vs. comprehensibility trade-off by producing interpretable models optimized for the specific production set at hand. The method requires production set inputs to be present when generating the predictive model, a demand fulfilled in most, but not all, predic- tive modeling scenarios. In oracle coaching, a highly accurate, but opaque, model is first induced from the training data. This model (“the oracle”) is then used to label both the training instances and the production instances. Finally, interpretable models are trained using different combinations of the resulting data sets. In this paper, the oracle coaching produces regression trees, using neural networks and random forests as oracles. The experiments, using 32 publicly available data sets, show that the oracle coaching leads to significantly improved predictive performance, compared to standard induction. In addition, it is also shown that a highly accurate opaque model can be successfully used as a pre- processing step to reduce the noise typically present in data, even in situations where production inputs are not available. In fact, just augmenting or replacing training data with another copy of the training set, but with the predictions from the opaque model as targets, produced significantly more accurate and/or more compact regression trees.

Place, publisher, year, edition, pages
IEEE, 2014
Keywords
Oracle coaching, Regression trees, Predictive modeling, Interpretable models, Machine learning, Data mining
National Category
Computer Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:hb:diva-7319 (URN)2320/14712 (Local ID)978-1-4799-4518-4/14 (ISBN)2320/14712 (Archive number)2320/14712 (OAI)
Conference
5th IEEE Symposium Computational Intelligence and Data Mining, 9-12 Decmber, Orlando, FL, USA
Note

Sponsorship:

This work was supported by the Swedish Foundation for Strategic

Research through the project High-Performance Data Mining for Drug Effect

Detection (IIS11-0053), the Swedish Retail and Wholesale Development

Council through the project Innovative Business Intelligence Tools (2013:5)

and the Knowledge Foundation through the project Big Data Analytics by

Online Ensemble Learning (20120192).

Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2018-01-10
Johansson, U., Sönströd, C., Linusson, H. & Boström, H. (2014). Regression Trees for Streaming Data with Local Performance Guarantees. Paper presented at IEEE International Conference on Big Data, 27-30 October, 2014, Washington, DC, USA. Paper presented at IEEE International Conference on Big Data, 27-30 October, 2014, Washington, DC, USA. IEEE
Open this publication in new window or tab >>Regression Trees for Streaming Data with Local Performance Guarantees
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Online predictive modeling of streaming data is a key task for big data analytics. In this paper, a novel approach for efficient online learning of regression trees is proposed, which continuously updates, rather than retrains, the tree as more labeled data become available. A conformal predictor outputs prediction sets instead of point predictions; which for regression translates into prediction intervals. The key property of a conformal predictor is that it is always valid, i.e., the error rate, on novel data, is bounded by a preset significance level. Here, we suggest applying Mondrian conformal prediction on top of the resulting models, in order to obtain regression trees where not only the tree, but also each and every rule, corresponding to a path from the root node to a leaf, is valid. Using Mondrian conformal prediction, it becomes possible to analyze and explore the different rules separately, knowing that their accuracy, in the long run, will not be below the preset significance level. An empirical investigation, using 17 publicly available data sets, confirms that the resulting rules are independently valid, but also shows that the prediction intervals are smaller, on average, than when only the global model is required to be valid. All-in-all, the suggested method provides a data miner or a decision maker with highly informative predictive models of streaming data.

Place, publisher, year, edition, pages
IEEE, 2014
Keywords
Conformal Prediction, Streaming data, Regression trees, Interpretable models, Machine learning, Data mining
National Category
Computer Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:hb:diva-7324 (URN)10.1109/BigData.2014.7004263 (DOI)2320/14627 (Local ID)978-1-4799-5666-1/14 (ISBN)2320/14627 (Archive number)2320/14627 (OAI)
Conference
IEEE International Conference on Big Data, 27-30 October, 2014, Washington, DC, USA
Note

Sponsorship:

This work was supported by the Swedish Foundation for Strategic

Research through the project High-Performance Data Mining for Drug Effect

Detection (IIS11-0053), the Swedish Retail and Wholesale Development

Council through the project Innovative Business Intelligence Tools (2013:5)

and the Knowledge Foundation through the project Big Data Analytics by

Online Ensemble Learning (20120192).

Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2018-01-10
Johansson, U., Sönströd, C., Löfström, T. & Boström, H. (2012). Obtaining accurate and comprehensible classifiers using oracle coaching. Intelligent Data Analysis, Volume 16(Number 2), 247-263
Open this publication in new window or tab >>Obtaining accurate and comprehensible classifiers using oracle coaching
2012 (English)In: Intelligent Data Analysis, ISSN 1088-467X, E-ISSN 1571-4128, Vol. Volume 16, no Number 2, p. 247-263Article in journal (Refereed) Published
Abstract [en]

While ensemble classifiers often reach high levels of predictive performance, the resulting models are opaque and hence do not allow direct interpretation. When employing methods that do generate transparent models, predictive performance typically has to be sacrificed. This paper presents a method of improving predictive performance of transparent models in the very common situation where instances to be classified, i.e., the production data, are known at the time of model building. This approach, named oracle coaching, employs a strong classifier, called an oracle, to guide the generation of a weaker, but transparent model. This is accomplished by using the oracle to predict class labels for the production data, and then applying the weaker method on this data, possibly in conjunction with the original training set. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves predictive performance, measured by both accuracy and area under ROC curve, compared to using training data only. This result is shown to be robust for a variety of methods for generating the oracles and transparent models. More specifically, random forests and bagged radial basis function networks are used as oracles, while J48 and JRip are used for generating transparent models. The evaluation further shows that significantly better results are obtained when using the oracle-classified production data together with the original training data, instead of using only oracle data. An analysis of the fidelity of the transparent models to the oracles shows that performance gains can be expected from increasing oracle performance rather than from increasing fidelity. Finally, it is shown that further performance gains can be achieved by adjusting the relative weights of training data and oracle data.

Place, publisher, year, edition, pages
IOS Press, 2012
Keywords
Classification, Comprehensibility, Decision trees, Decision lists, Oracle coaching, Machine learning, Data mining
National Category
Computer Sciences Computer and Information Sciences
Research subject
Bussiness and IT
Identifiers
urn:nbn:se:hb:diva-1346 (URN)10.3233/IDA-2012-0522 (DOI)000301366100007 ()2320/11567 (Local ID)2320/11567 (Archive number)2320/11567 (OAI)
Available from: 2015-11-13 Created: 2015-11-13 Last updated: 2020-01-29Bibliographically approved
Sönströd, C., Johansson, U. & König, R. (2011). Evolving Accurate and Comprehensible Classification Rules. Paper presented at IEEE Congress on Evolutionary Computation (CEC), 5-8 juni, New orleans, LA, USA, 2011. Paper presented at IEEE Congress on Evolutionary Computation (CEC), 5-8 juni, New orleans, LA, USA, 2011. IEEE
Open this publication in new window or tab >>Evolving Accurate and Comprehensible Classification Rules
2011 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, Genetic Programming is used to evolve ordered rule sets (also called decision lists) for a number of benchmark classification problems, with evaluation of both predictive performance and comprehensibility. The main purpose is to compare this approach to the standard decision list algorithm JRip and also to evaluate the use of different length penalties and fitness functions for evolving this type of model. The results, using 25 data sets from the UCI repository, show that genetic decision lists with accuracy-based fitness functions outperform JRip regarding accuracy. Indeed, the best setup was significantly better than JRip. JRip, however, held a slight advantage over these models when evaluating AUC. Furthermore, all genetic decision list setups produced models that were more compact than JRip models, and thus more readily comprehensible. The effect of using different fitness functions was very clear; in essence, models performed best on the evaluation criterion that was used in the fitness function, with a worsening of the performance for other criteria. Brier score fitness provided a middle ground, with acceptable performance on both accuracy and AUC. The main conclusion is that genetic programming solves the task of evolving decision lists very well, but that different length penalties and fitness functions have immediate effects on the results. Thus, these parameters can be used to control the trade-off between different aspects of predictive performance and comprehensibility.

Place, publisher, year, edition, pages
IEEE, 2011
Keywords
genetic programming, decision lists, Machine Learning, data mining, Data Mining
National Category
Computer and Information Sciences Information Systems
Research subject
Bussiness and IT
Identifiers
urn:nbn:se:hb:diva-6698 (URN)10.1109/CEC.2011.5949784 (DOI)2320/10007 (Local ID)978-1-4244-7834-7 (ISBN)2320/10007 (Archive number)2320/10007 (OAI)
Conference
IEEE Congress on Evolutionary Computation (CEC), 5-8 juni, New orleans, LA, USA, 2011
Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2018-01-10
Johansson, U., Löfström, T. & Sönströd, C. (2011). Locally Induced Predictive Models. Paper presented at IEEE International Conference on Systems, Man, and Cybernetics. Paper presented at IEEE International Conference on Systems, Man, and Cybernetics. IEEE
Open this publication in new window or tab >>Locally Induced Predictive Models
2011 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Most predictive modeling techniques utilize all available data to build global models. This is despite the wellknown fact that for many problems, the targeted relationship varies greatly over the input space, thus suggesting that localized models may improve predictive performance. In this paper, we suggest and evaluate a technique inducing one predictive model for each test instance, using only neighboring instances. In the experimentation, several different variations of the suggested algorithm producing localized decision trees and neural network models are evaluated on 30 UCI data sets. The main result is that the suggested approach generally yields better predictive performance than global models built using all available training data. As a matter of fact, all techniques producing J48 trees obtained significantly higher accuracy and AUC, compared to the global J48 model. For RBF network models, with their inherent ability to use localized information, the suggested approach was only successful with regard to accuracy, while global RBF models had a better ranking ability, as seen by their generally higher AUCs.

Place, publisher, year, edition, pages
IEEE, 2011
Keywords
local learning, predictive modeling, decision trees, rbf networks, Machine Learning, Data Mining
National Category
Computer Sciences Computer and Information Sciences
Research subject
Bussiness and IT
Identifiers
urn:nbn:se:hb:diva-6767 (URN)10.1109/ICSMC.2011.6083922 (DOI)2320/10327 (Local ID)978-1-4577-0651-6 (ISBN)2320/10327 (Archive number)2320/10327 (OAI)
Conference
IEEE International Conference on Systems, Man, and Cybernetics
Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2020-01-29
Johansson, U., Sönströd, C. & Löfström, T. (2011). One Tree to Explain Them All. Paper presented at IEEE Congress on Evolutionary Computation (CEC). Paper presented at IEEE Congress on Evolutionary Computation (CEC). IEEE
Open this publication in new window or tab >>One Tree to Explain Them All
2011 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Random forest is an often used ensemble technique, renowned for its high predictive performance. Random forests models are, however, due to their sheer complexity inherently opaque, making human interpretation and analysis impossible. This paper presents a method of approximating the random forest with just one decision tree. The approach uses oracle coaching, a recently suggested technique where a weaker but transparent model is generated using combinations of regular training data and test data initially labeled by a strong classifier, called the oracle. In this study, the random forest plays the part of the oracle, while the transparent models are decision trees generated by either the standard tree inducer J48, or by evolving genetic programs. Evaluation on 30 data sets from the UCI repository shows that oracle coaching significantly improves both accuracy and area under ROC curve, compared to using training data only. As a matter of fact, resulting single tree models are as accurate as the random forest, on the specific test instances. Most importantly, this is not achieved by inducing or evolving huge trees having perfect fidelity; a large majority of all trees are instead rather compact and clearly comprehensible. The experiments also show that the evolution outperformed J48, with regard to accuracy, but that this came at the expense of slightly larger trees.

Place, publisher, year, edition, pages
IEEE, 2011
Keywords
genetic programming, random forest, oracle coaching, decision trees, Machine learning, Data mining
National Category
Computer Sciences Computer and Information Sciences
Research subject
Bussiness and IT
Identifiers
urn:nbn:se:hb:diva-6680 (URN)2320/9855 (Local ID)978-1-4244-7834-7 (ISBN)2320/9855 (Archive number)2320/9855 (OAI)
Conference
IEEE Congress on Evolutionary Computation (CEC)
Note

Sponsorship:

This work was supported by the INFUSIS project www.

his.se/infusis at the University of Skövde, Sweden, in partnership

with the Swedish Knowledge Foundation under grant

2008/0502.

Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2020-01-29
Johansson, U., Sönströd, C., Norinder, U. & Boström, H. (2011). The Trade-Off between Accuracy and Comprehensibility for Predictive In Silico Modeling. Future Medicinal Chemistry, 3(6), 647-663
Open this publication in new window or tab >>The Trade-Off between Accuracy and Comprehensibility for Predictive In Silico Modeling
2011 (English)In: Future Medicinal Chemistry, ISSN 1756-8919, E-ISSN 1756-8927, Vol. 3, no 6, p. 647-663Article in journal (Refereed) Published
Place, publisher, year, edition, pages
Future Science, 2011
Keywords
accuracy vs. interpretability, in silico modeling, classification, Machine learning, Data mining
National Category
Computer Sciences Computer and Information Sciences
Research subject
Bussiness and IT
Identifiers
urn:nbn:se:hb:diva-3249 (URN)10.4155/fmc.11.51 (DOI)2320/9856 (Local ID)2320/9856 (Archive number)2320/9856 (OAI)
Funder
Knowledge Foundation
Note

Sponsorship:

This work was supported by the INFUSIS project www.

his.se/infusis at the University of Skövde, Sweden, in partnership

with the Swedish Knowledge Foundation under grant

2008/0502.

Available from: 2015-11-13 Created: 2015-11-13 Last updated: 2018-01-10Bibliographically approved
Johansson, U., Sönströd, C. & Löfström, T. (2010). Oracle Coached Decision Trees and Lists. Paper presented at Advances in Intelligent Data Analysis IX, 9th International Symposium, IDA 2010. Paper presented at Advances in Intelligent Data Analysis IX, 9th International Symposium, IDA 2010. Springer-Verlag Berlin Heidelberg
Open this publication in new window or tab >>Oracle Coached Decision Trees and Lists
2010 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a novel method for obtaining increased predictive performance from transparent models in situations where production input vectors are available when building the model. First, labeled training data is used to build a powerful opaque model, called an oracle. Second, the oracle is applied to production instances, generating predicted target values, which are used as labels. Finally, these newly labeled instances are utilized, in different combinations with normal training data, when inducing a transparent model. Experimental results, on 26 UCI data sets, show that the use of oracle coaches significantly improves predictive performance, compared to standard model induction. Most importantly, both accuracy and AUC results are robust over all combinations of opaque and transparent models evaluated. This study thus implies that the straightforward procedure of using a coaching oracle, which can be used with arbitrary classifiers, yields significantly better predictive performance at a low computational cost.

Place, publisher, year, edition, pages
Springer-Verlag Berlin Heidelberg, 2010
Series
LNCS ; 6065
Keywords
decision trees, rule learning, coaching, Machine learning
National Category
Computer Sciences Information Systems
Identifiers
urn:nbn:se:hb:diva-6403 (URN)10.1007/978-3-642-13062-5_8 (DOI)2320/6797 (Local ID)978-3-642-13061-8 (ISBN)2320/6797 (Archive number)2320/6797 (OAI)
Conference
Advances in Intelligent Data Analysis IX, 9th International Symposium, IDA 2010
Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2020-01-29
Sönströd, C., Johansson, U., Boström, H. & Norinder, U. (2010). Pin-Pointing Concept Descriptions. Paper presented at 2010 IEEE International Conference on Systems Man and Cybernetics (SMC). Paper presented at 2010 IEEE International Conference on Systems Man and Cybernetics (SMC).
Open this publication in new window or tab >>Pin-Pointing Concept Descriptions
2010 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this study, the task of obtaining accurate and comprehensible concept descriptions of a specific set of production instances has been investigated. The suggested method, inspired by rule extraction and transductive learning, uses a highly accurate opaque model, called an oracle, to coach construction of transparent decision list models. The decision list algorithms evaluated are JRip and four different variants of Chipper, a technique specifically developed for concept description. Using 40 real-world data sets from the drug discovery domain, the results show that employing an oracle coach to label the production data resulted in significantly more accurate and smaller models for almost all techniques. Furthermore, augmenting normal training data with production data labeled by the oracle also led to significant increases in predictive performance, but with a slight increase in model size. Of the techniques evaluated, normal Chipper optimizing FOIL’s information gain and allowing conjunctive rules was clearly the best. The overall conclusion is that oracle coaching works very well for concept description.

Keywords
concept description, decision lists, Machine Learning, data mining
National Category
Computer and Information Sciences Information Systems
Identifiers
urn:nbn:se:hb:diva-6518 (URN)10.1109/ICSMC.2010.5641998 (DOI)2320/7460 (Local ID)2320/7460 (Archive number)2320/7460 (OAI)
Conference
2010 IEEE International Conference on Systems Man and Cybernetics (SMC)
Note

Sponsorship:

This work was supported by the INFUSIS project

(www.his.se/infusis) at the University of Skövde, Sweden, in

partnership with the Swedish Knowledge Foundation under

grant 2008/0502.

Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2018-01-10
Organisations

Search in DiVA

Show all publications