Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Probabilistic Prediction in scikit-learn
Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
Högskolan i Borås, Akademin för bibliotek, information, pedagogik och IT.
2021 (Engelska)Konferensbidrag, Publicerat paper (Övrigt vetenskapligt)
Abstract [en]

Adding confidence measures to predictive models should increase the trustworthiness, but only if the models are well-calibrated. Historically, some algorithms like logistic regression, but also neural networks, have been considered to produce well-calibrated probability estimates off-the-shelf. Other techniques, like decision trees and Naive Bayes, on the other hand, are infamous for being significantly overconfident in their probabilistic predictions. In this paper, a large experimental study is conducted to investigate how well-calibrated models produced by a number of algorithms in the scikit-learn library are out-of-the-box, but also if either the built-in calibration techniques Platt scaling and isotonic regression, or Venn-Abers, can be used to improve the calibration. The results show that of the seven algorithms evaluated, the only one obtaining well-calibrated models without the external calibration is logistic regression. All other algorithms, i.e., decision trees, adaboost, gradient boosting, kNN, naive Bayes and random forest benefit from using any of the calibration techniques. In particular, decision trees, Naive Bayes and the boosted models are substantially improved using external calibration. From a practitioner’s perspective, the obvious recommendation becomes to incorporate calibration when using probabilistic prediction. Comparing the different calibration techniques, Platt scaling and VennAbers generally outperform isotonic regression, on these rather small datasets. Finally, the unique ability of Venn-Abers to output not only well-calibrated probability estimates, but also the confidence in these estimates is demonstrated.

Ort, förlag, år, upplaga, sidor
2021.
Nationell ämneskategori
Systemvetenskap, informationssystem och informatik
Forskningsämne
Handel och IT
Identifikatorer
URN: urn:nbn:se:hb:diva-26746OAI: oai:DiVA.org:hb-26746DiVA, id: diva2:1603345
Konferens
The 18th International Conference on Modeling Decisions for Artificial Intelligence, On-line (from Umeå, Sweden), September 27 - 30, 2021.
Tillgänglig från: 2021-10-15 Skapad: 2021-10-15 Senast uppdaterad: 2025-09-24Bibliografiskt granskad

Open Access i DiVA

fulltext(454 kB)7655 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 454 kBChecksumma SHA-512
3f3a32ad20d6aaf05762b93d4705621b803f742e303100da3396a915d33302cd09a796d8a89225b9d4fc761e1f6519b40062b3664f1be59f7a948fd3c1f55373
Typ fulltextMimetyp application/pdf

Person

Sweidan, DirarJohansson, Ulf

Sök vidare i DiVA

Av författaren/redaktören
Sweidan, DirarJohansson, Ulf
Av organisationen
Akademin för bibliotek, information, pedagogik och IT
Systemvetenskap, informationssystem och informatik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 7655 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 10126 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf