Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Comparison of Clustering Methods for Cultural Heritage Corpora
University of Borås, Faculty of Librarianship, Information, Education and IT.
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This thesis compares eight agglomerative hierarchical clustering methods, two divisive hierarchical methods, and three k-means methods. The data used was a corpus of 689 texts written by school children in 1930s Ireland which have been transcribed by volunteers. The effects of stop word removal and stemming on each of these were investigated, as was the use of document embeddings as input instead of a document-term matrix. Overall, k-means methods produced the most desirable results, and document embeddings markedly improved output in most cases.

Place, publisher, year, edition, pages
2022.
Keywords [en]
Document Clustering, Cultural Heritage, Digital Humanities
National Category
Information Studies
Identifiers
URN: urn:nbn:se:hb:diva-29593OAI: oai:DiVA.org:hb-29593DiVA, id: diva2:1747802
Available from: 2023-03-31 Created: 2023-03-31 Last updated: 2023-03-31Bibliographically approved

Open Access in DiVA

No full text in DiVA

By organisation
Faculty of Librarianship, Information, Education and IT
Information Studies

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 145 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf