Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Supporting the Exploration of a Corpus of 17th-Century Scholarly Correspondences by Topic Modeling.
University of Borås, Swedish School of Library and Information Science.
2011 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper deals with the application of topic modeling to a corpus of 17th-century scholarly correspondences built up by the CKCC project. The topic modeling approaches considered are latent Dirichlet allocation (LDA), latent semantic analysis (LSA), and random indexing (RI). After describing the corpus and the topic modeling approaches, we present an experiment for the quantitative evaluation of the performance of the various topic modeling approaches in reproducing human-labeled words in a subset of the corpus. In our experiments random indexing shows the best performance, with scope for further improvement. Next we discuss the role of topic modeling in the CKCC Epistolarium, the virtual research environment that is being developed for exploring and analysing the CKCC corpus. The key feature of topic modeling is its ability to calculate similarities between words and texts. In an example we illustrate how such an approach may yield results that transcend a regular text search.

Place, publisher, year, edition, pages
University of Copenhagen , 2011.
Keywords [en]
topic modeling, latent semantic indexing, random projection
Keywords [sv]
text mining
National Category
Computer and Information Sciences Natural Language Processing
Research subject
Library and Information Science
Identifiers
URN: urn:nbn:se:hb:diva-6661Local ID: 2320/9689OAI: oai:DiVA.org:hb-6661DiVA, id: diva2:887360
Conference
SDH 2011 Supporting Digital Humanities: Answering the unaskable
Available from: 2015-12-22 Created: 2015-12-22 Last updated: 2025-02-01

Open Access in DiVA

fulltext(212 kB)593 downloads
File information
File name FULLTEXT01.pdfFile size 212 kBChecksum SHA-512
80e2ba385c716284402b7d1ecf9ebc08c68b77be863507f5e4186270ac669b954b00a977e4a8c04b3f28fd2f02ff718bdadce285c9b2651513cf2add65d420d5
Type fulltextMimetype application/pdf

Authority records

Wittek, Peter

Search in DiVA

By author/editor
Wittek, Peter
By organisation
Swedish School of Library and Information Science
Computer and Information SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 593 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 505 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • harvard-cite-them-right
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf