Predictive modelling for user preferences in digital libraries: Using sentiment analysis and machine learning
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
The rapid advancement of digital technologies has transformed information access, making digital libraries essential knowledge repositories. This thesis explores the application of Sentiment Analysis and Artificial Intelligence (AI) in analysing ratings and reviews within digital libraries. By leveraging AI, digital libraries can address challenges in managing vast information volumes and understanding user preferences, thereby enhancing recommendation systems.
This study aims to develop predictive models employing user sentiments in reviews and ratings to improve recommendation systems. We investigated the correlation between review sentiments and numerical ratings, evaluated regression and classification models, and examined the impact of feature engineering on prediction accuracy. Utilising Orange Data Mining software, we analysed two datasets from Kaggle, focusing on Amazon Books and Kindle ratings and reviews. Besides VADER and SentiArt were used for sentiment analysis.
Results showed VADER outperformed SentiArt in capturing sentiment nuances, with a higher correlation to numerical ratings. Document Embedding (SBERT)demonstrated a moderate correlation with ratings, explaining 32.3% of the variance, whereas Bag of Words (TF-IDF) explained 10.5%. Linear Regression consistently outperformed other models, explaining up to 25% of rating variance. Neural Networks also showed promise in classification tasks, accurately categorising 'Low' and 'High' ratings.
In conclusion, this research demonstrates the potential of AI to enhance digital libraries and improve recommendation systems. The findings highlight the benefits of integrating advanced data analytics into digital libraries to boost user satisfaction and service quality. However, it also recognises challenges such as data privacy and suggests the importance of environmental sustainability in AI applications as future research.
Place, publisher, year, edition, pages
2024.
Keywords [en]
Machine learning, Sentiment analysis, Classification analysis, Artificial intelligence, Digital libraries
National Category
Information Studies
Identifiers
URN: urn:nbn:se:hb:diva-33047OAI: oai:DiVA.org:hb-33047DiVA, id: diva2:1925709
2025-01-132025-01-092025-09-24Bibliographically approved