Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • harvard-cite-them-right
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints
Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
Högskolan i Borås, Institutionen Biblioteks- och informationsvetenskap / Bibliotekshögskolan.
2011 (engelsk)Inngår i: International Journal on Digital Libraries, ISSN 1432-5012, E-ISSN 1432-1300Artikkel i tidsskrift (Fagfellevurdert)
Abstract [en]

Digital libraries increasingly bene t from re- search on automated text categorization for improved access. Such research is typically carried out by using standard test collections. In this paper we present a pilot experiment of replacing such test collections by a set of 6000 objects from a real-world digital repos- itory, indexed by Library of Congress Subject Head- ings, and test support vector machines in a supervised learning setting for their ability to reproduce the exist- ing classi cation. To augment the standard approach, we introduce a combination of two novel elements: us- ing functions for document content representation in Hilbert space, and adding extra semantics from lexical resources to the representation. Results suggest that wavelet-based kernels slightly outperformed traditional kernels on classi cation reconstruction from abstracts and vice versa from full-text documents, the latter out- come due to word sense ambiguity. The practical imple- mentation of our methodological framework enhances the analysis and representation of speci c knowledge relevant to large-scale digital collections, in this case the thematic coverage of the collections. Representation of speci c knowledge about digital collections is one of the basic elements of the persistent archives and the less studied one (compared to representations of digital ob- jects and collections). Our research is an initial step in this direction developing further the methodological ap- proach and demonstrating that text categorisation can be applied to analyse the thematic coverage in digital repositories.

sted, utgiver, år, opplag, sider
2011.
Emneord [en]
kernel methods, text classification, support vector machines, semantic enrichment, hilbert spaces, digital libraries, text categorization, machine learning, analogical information representation, wavelet analysis
HSV kategori
Forskningsprogram
Biblioteks- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:hb:diva-3241Lokal ID: 2320/9820OAI: oai:DiVA.org:hb-3241DiVA, id: diva2:871338
Tilgjengelig fra: 2015-11-13 Laget: 2015-11-13 Sist oppdatert: 2018-01-10

Open Access i DiVA

fulltekst(329 kB)652 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 329 kBChecksum SHA-512
ab228805fd57ce966874262e266a4a4bcb59dc98a26dacc16e8119942e4ece96357028880e88318bd62c329bc5a7f8f893cc0ebf565678776867ba0b50f47927
Type fulltextMimetype application/pdf

Personposter BETA

Darányi, SándorWittek, Peter

Søk i DiVA

Av forfatter/redaktør
Darányi, SándorWittek, Peter
Av organisasjonen
I samme tidsskrift
International Journal on Digital Libraries

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 652 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 907 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • harvard-cite-them-right
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf