Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
How can a module for sentiment analysis be designed to classify tweets about covid19
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Hur kan man designa en modul inom sentimentanalys för att klassificera tweets om covid19 (Swedish)
Abstract [en]

The sentiment analysis of a text is getting more focus nowadays from different entities for a variety of reasons. Emotions mining (sentiment analysis) is a very interesting subject to explore thus the research question is How can a module for sentiment analysis be designed to classify tweets about Covid-19. The dataset used for this project was taken from Kaggle and preprocessed with various methods such as Bag of Words and term frequency-inverse document frequency. The models are based on the following algorithms: KNN, SVM, DT, and NB. Some models are also based on the combination of ML and Lexicon. The outcome of the experiment showed that the lexicon method with an accuracy of 87% exceeded the machine learning methods implemented in this thesis and the experiments done by the ML community in Kaggle. This implies that the traditional lexicon approach is still considered a fit choice in the sentiment analysis field.

Abstract [sv]

På senaste tiden har sentimentanalyser av text fått ett större fokus. Känsloutvinning (Emotions mining) är ett väldigt intressant ämne att utforska, Forskningsfrågan är då Hur kan man designa en modul inom sentimentanalys för att klassificera tweets om covid19. Datasetet som används är hämtat från Kaggle och sedan preprocesserat med hjälp av olika metoder såsom Bag of Words och term frequency-inverse document frequency. Modellerna är baserad på följande algoritmer: KNN, SVM, DT, och NB. Vissa modeller är baserad på en kombination of ML och Lexicon. Slutresultatet av experimentet visade sig vara att lexikon metoden med en prestanda av 87% översteg maskin inlärningsmetoderna som utfördes i denna uppsatsen och övriga experiment från ML gemensamhet i kaggle. Detta antyder att lexikon metoden är fortfarande ett bra val inom sentimentanalys området. 

Place, publisher, year, edition, pages
2021.
Keywords [en]
Sentiment Analysis, Machine Learning, Lexicon technique, Kaggle, Preprocessing
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hb:diva-26834OAI: oai:DiVA.org:hb-26834DiVA, id: diva2:1606925
Subject / course
Informatics
Available from: 2021-11-09 Created: 2021-10-29 Last updated: 2021-11-09Bibliographically approved

Open Access in DiVA

uppsats(504 kB)196 downloads
File information
File name FULLTEXT01.pdfFile size 504 kBChecksum SHA-512
e98771afabd62dbdd29b64990b68317ca6a4cb42c62efc0a9df937942bc50b745f3e3b216994cedff19c85b03df0be5282d726111e2d262fb33db0b7690766b3
Type fulltextMimetype application/pdf

Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 196 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 157 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf