Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Probabilistic Prediction in scikit-learn
University of Borås, Faculty of Librarianship, Information, Education and IT.
University of Borås, Faculty of Librarianship, Information, Education and IT.
2021 (English)Conference paper, Published paper (Other academic)
Abstract [en]

Adding confidence measures to predictive models should increase the trustworthiness, but only if the models are well-calibrated. Historically, some algorithms like logistic regression, but also neural networks, have been considered to produce well-calibrated probability estimates off-the-shelf. Other techniques, like decision trees and Naive Bayes, on the other hand, are infamous for being significantly overconfident in their probabilistic predictions. In this paper, a large experimental study is conducted to investigate how well-calibrated models produced by a number of algorithms in the scikit-learn library are out-of-the-box, but also if either the built-in calibration techniques Platt scaling and isotonic regression, or Venn-Abers, can be used to improve the calibration. The results show that of the seven algorithms evaluated, the only one obtaining well-calibrated models without the external calibration is logistic regression. All other algorithms, i.e., decision trees, adaboost, gradient boosting, kNN, naive Bayes and random forest benefit from using any of the calibration techniques. In particular, decision trees, Naive Bayes and the boosted models are substantially improved using external calibration. From a practitioner’s perspective, the obvious recommendation becomes to incorporate calibration when using probabilistic prediction. Comparing the different calibration techniques, Platt scaling and VennAbers generally outperform isotonic regression, on these rather small datasets. Finally, the unique ability of Venn-Abers to output not only well-calibrated probability estimates, but also the confidence in these estimates is demonstrated.

Place, publisher, year, edition, pages
2021.
National Category
Information Systems
Research subject
Business and IT
Identifiers
URN: urn:nbn:se:hb:diva-26746OAI: oai:DiVA.org:hb-26746DiVA, id: diva2:1603345
Conference
The 18th International Conference on Modeling Decisions for Artificial Intelligence, On-line (from Umeå, Sweden), September 27 - 30, 2021.
Available from: 2021-10-15 Created: 2021-10-15 Last updated: 2021-10-18Bibliographically approved

Open Access in DiVA

fulltext(454 kB)6510 downloads
File information
File name FULLTEXT01.pdfFile size 454 kBChecksum SHA-512
3f3a32ad20d6aaf05762b93d4705621b803f742e303100da3396a915d33302cd09a796d8a89225b9d4fc761e1f6519b40062b3664f1be59f7a948fd3c1f55373
Type fulltextMimetype application/pdf

Authority records

Sweidan, DirarJohansson, Ulf

Search in DiVA

By author/editor
Sweidan, DirarJohansson, Ulf
By organisation
Faculty of Librarianship, Information, Education and IT
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 6510 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 8084 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf