Change search
Refine search result
1 - 21 of 21
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • harvard-cite-them-right
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Allwood, Jens
    et al.
    University of Borås, School of Business and IT.
    Hammarström, Harald
    Hendrikse, Andries
    Ngcobo, Mtholeni N.
    Nomdebevana, Nozibele
    Pretorius, Laurette
    van der Merwe, Mac
    Work on Spoken (Multimodal) Language Corpora in South Africa2010Conference paper (Refereed)
    Abstract [en]

    This paper describes past, ongoing and planned work on the collection and transcription of spoken language samples for all the South African official languages and as part of this the training of researchers in corpus linguistic research skills. More specifically the work has involved (and still involves) establishing an international corpus linguistic network linked to a network hub at a UNISA website and the development of research tools, a corpus research guide and workbook for multimodal communication and spoken language corpus research. As an example of the work we are doing and hope to do more of in the future, we present a small pilot study of the influence of English and Afrikaans on the 100 most frequent words in spoken Xhosa as this is evidenced in the corpus of spoken interaction we have gathered so far. Other planned work, besides work on spoken language phenomena, involves comparison of spoken and written language and work on communicative body movements (gestures) and their relation to speech.

  • 2.
    Darányi, Sandor
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Wittek, Peter
    University of Borås, Swedish School of Library and Information Science.
    Dobreva, Milena
    Toward a 5M Model of Digital Libraries2010Conference paper (Refereed)
    Abstract [en]

    Whereas the DELOS DRM and the 5S model of digital libraries (DL) addresses the formal side of DL, we argue that a parallel 5M model is emerging as best practice worldwide, integrating multicultural, multilingual, multimodal digital objects with multivariate statistics-based document indexing, categorization and retrieval methods. The fifth M stands for the modeling the information searching behavior of users, and of collection development. We show how an extension of the 5S model to Hilbert space (a) points toward the integration of several Ms; (b) makes the tracking of evolving semantic content feasible, and (c) leads to a field interpretation of word and sentence semantics underlying language change. First experimental results from the Strathprints e-repository verify the mathematical foundations of the 5M model.

  • 3.
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Examples of Formulaity in Narratives and Scientific Communication2010In: Proceedings of the 1st International AMICUS Workshop, October 21, 2010, Vienna, Austria / [ed] Sándor Darányi, Piroska Lendvai, University of Szeged, Hungary , 2010, p. 29-35Conference paper (Refereed)
    Abstract [en]

    The AMICUS project was designed to promote scholarly networking in a topical area, motif recognition in texts, including its automation. Prior to doing so however it is necessary to show the theoretical underpinnings of the research idea. My argument is that evidence from different disciplines amounts to fragmented pieces of a bigger picture. By compiling them like pieces of a puzzle, one can see how the concept of formulaity applies to folklore texts and scholarly communication alike. Regardless of the actual name of the concept (e.g. motif, function, canonical form), what matters is that document parts and whole documents can be characterized by standard sequences of content elements, such formulaic expressions enabling higher-level document indexing and classification by machine learning, plus document retrieval. Information filtering plays a key role in the proposed technology.

  • 4.
    Darányi, Sándor
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Forró, László
    Detecting Multiple Motif Co-occurrences in the Aarne-Thompson-Uther Tale Type Catalog: A Preliminary Survey2011In: Anales de Documentación, ISSN 1575-2437, E-ISSN 1697-7904Article in journal (Other academic)
  • 5.
    Darányi, Sándor
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Forró, László
    Toward Sequencing Multiple Motif Co-Occurrences2011In: Tanulmányok az örökségmenedzsmentröl 2. Kulturális örökségek kezelése [Studies in Heritage Management 2: The Management of Cultural Heritage]. / [ed] L. Bassa, Információs Társadalomért Alapítvány , 2011, p. 247-260Chapter in book (Refereed)
    Abstract [en]

    Catalogs project subject field experience onto a multidimensional map which is then converted to a hierarchical list. In the case of the Aarne-Thompson-Uther Tale Type Catalog (ATU), this subject field is the global pattern of tale content defining tale types as canonical motif sequences. To extract and visualize such a map, we considered ATU as a corpus and ana-lysed two segments of it, “Supernatural adversaries” (types 300-399) in particular and “Tales of magic” (types 300-749) in general. The two corpora were scru-tinized for multiple motif co-occurrences and visualized by two-mode clustering of a bag-of-motif co-occurrences matrix. Findings indicate the presence of canonical content units above motif level as well. The organization scheme of folk narratives utilizing motif sequences is reminiscent of nucleotid sequences in the genetic code

  • 6.
    Darányi, Sándor
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Lendvai, Piroska
    Proceedings of the First AMICUS Workshop, October 21, 2010 Vienna, Austria2010Collection (editor) (Other academic)
    Abstract [en]

    In cultural heritage objects, digitized or not, content indicators occurring on higher than word level are often called motifs or their equivalent. Their recognition for document classification and retrieval is largely unresolved. Work on identifying rhetorical, narrative and persuasive elements in scientific texts has been progressing, in several, but largely unconnected tracks. The AMICUS project1 (running between 2009 and 2012) set out to test a possible way to resolve these issues, starting with the identification of Proppian functions in folk tale corpora and adapting the solution to the identification of tale motifs or their functional counterparts. AMICUS has devoted its first project year to listing the corpora, tools, methods and contacts available to address these issues. The initiators of the project have identified a common need in the processing of texts from both the cultural heritage (CH) and scientific communication (SC) domains: to perform automated, large-scale higher-order text analytics, i.e., to reach an advanced level of text understanding so that structured knowledge can be extracted from unstructured text. The four research groups propose to tackle an important aspect of this complex issue by investigating how linguistic elements convey motifs in texts from the CH and the SC domains. Our shared working hypothesis is that the identity of higherorder content-bearing elements, i.e., textual units that are typically designated for e.g. document indexing, classification, enrichment, and the like, strongly depends on community perception.

  • 7.
    Darányi, Sándor
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Wittek, Peter
    University of Borås, Swedish School of Library and Information Science.
    The gravity of meaning: Physics as a metaphor to model semantic changes2012Conference paper (Refereed)
    Abstract [en]

    Based on a computed toy example, we offer evidence that by plugging in similarity of word meaning as a force plus a small modification of Newton’s 2nd law, one can acquire specific “mass” values for index terms in a Saltonesque dynamic library environment. The model can describe two types of change which affect the semantic composition of document collections: the expansion of a corpus due to its update, and fluctuations of the gravitational potential energy field generated by normative language use as an attractor juxtaposed with actual language use yielding time-dependent term frequencies. By the evolving semantic potential of a vocabulary and concatenating the respective term “mass” values, one can model sentences or longer strings of symbols as vector-valued functions. Since the line integral of such functions is used to express the work of a particle in a gravitational field, the work equivalent of strings can be calculated.

  • 8.
    Darányi, Sándor
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Wittek, Peter
    University of Borås, Swedish School of Library and Information Science.
    Dobreva, Milena
    Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints2011In: International Journal on Digital Libraries, ISSN 1432-5012, E-ISSN 1432-1300Article in journal (Refereed)
    Abstract [en]

    Digital libraries increasingly bene t from re- search on automated text categorization for improved access. Such research is typically carried out by using standard test collections. In this paper we present a pilot experiment of replacing such test collections by a set of 6000 objects from a real-world digital repos- itory, indexed by Library of Congress Subject Head- ings, and test support vector machines in a supervised learning setting for their ability to reproduce the exist- ing classi cation. To augment the standard approach, we introduce a combination of two novel elements: us- ing functions for document content representation in Hilbert space, and adding extra semantics from lexical resources to the representation. Results suggest that wavelet-based kernels slightly outperformed traditional kernels on classi cation reconstruction from abstracts and vice versa from full-text documents, the latter out- come due to word sense ambiguity. The practical imple- mentation of our methodological framework enhances the analysis and representation of speci c knowledge relevant to large-scale digital collections, in this case the thematic coverage of the collections. Representation of speci c knowledge about digital collections is one of the basic elements of the persistent archives and the less studied one (compared to representations of digital ob- jects and collections). Our research is an initial step in this direction developing further the methodological ap- proach and demonstrating that text categorisation can be applied to analyse the thematic coverage in digital repositories.

  • 9.
    Darányi, Sándor
    et al.
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Wittek, Peter
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Konstantinidis, K
    CERTH..
    Papadopoulos, S
    CERTH..
    A Potential Surface Underlying Meaning?2015Conference paper (Other academic)
    Abstract [en]

    Machine learning algorithms utilizing gradient descent to identify concepts or more general learnables hint at a so-far ignored possibility, namely that local and global minima represent any vocabulary as a landscape against which evaluation of the results can take place. A simple example to illustrate this idea would be a potential surface underlying gravitation. However, to construct a gravitation-based representation of, e.g., word meaning, only the distance between localized items is a given in the vector space, whereas the equivalents of mass or charge are unknown in semantics. Clearly, the working hypothesis that physical fields could be a useful metaphor to study word and sentence meaning is an option but our current representations are incomplete in this respect.For a starter, consider that an RBF kernel has the capacity to generate a potential surface and hence create the impression of gravity, providing one with distance-based decay of interaction strength, plus a scalar scaling factor for the interaction, but of course no term masses. We are working on an experiment design to change that. Therefore, with certain mechanisms in neural networks that could host such quasi-physical fields, a novel approach to the modeling of mind content seems plausible, subject to scrutiny.Work in progress in another direction of the same idea indicates that by using certain algorithms, already emerged vs. still emerging content is clearly distinguishable, in line with Aristotle’s Metaphysics. The implications are that a model completed by “term mass” or “term charge” would enable the computation of the specific work equivalent of sentences or documents, and that via replacing semantics by other modalities, vector fields of more general symbolic content could exist as well. Also, the perceived hypersurface generated by the dynamics of language use may be a step toward more advanced models, for example addressing the Hamiltonian of expanding semantic systems, or the relationship between reaction paths in quantum chemistry vs. sentence construction by gradient descent.

  • 10. Declerck, Thierry
    et al.
    Lendvai, Piroska
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Multilingual and Semantic Extension of Folk Tale Catalogues2012Conference paper (Refereed)
    Abstract [en]

    We address the multilingual and semantic upgrades of two digital catalogues of motifs and types in folk-literature: the Thompson’s Motif-Index of Folk-Literature (TMI) and the Aarne-Thompson-Uther classification system (ATU). The methods convert, translate, and represent their digitized content in terms of various (so far often implicit) structural and linguistic components. The results will enable (i) utilizing these resources for semi-automatic analysis and indexing of texts of relevant genres, in a multilingual setting, and (ii) pre-processing the data, for analysing motif sequences in folktale plots. We plan to publish the resulting data, which can be made available in the Linked Open Data (LOD) framework.

  • 11.
    Ekström, Björn
    University of Borås, Faculty of Librarianship, Information, Education and IT.
    Developing a rule-based method for identifying researchers on Twitter: The case of vaccine discussions2019In: 17th International Conference on Scientometrics and Informetrics, ISSI 2019 - Proceedings Volume 2, 2019, Rom, 2019, p. 2618-2619Conference paper (Refereed)
    Abstract [en]

    This study seeks to develop a method for identifying the occurrences and proportions of researchers, media and other professionals active in Twitter discussions. As a case example, an anonymised dataset from Twitter vaccine discussions is used. The study proposes a method of using keywords as strings within lists to identify classes from user biographies. This provides a way to apply multiple classification principles to a set of Twitter biographies using semantic rules through the Python programming language.

  • 12. Lendvai, Piroska
    et al.
    Declerck, Thierry
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Gervás, Pablo
    Hervás, Raquel
    Malec, Scott
    Peinado, Federico
    Integration of Linguistic Markup into Semantic Models of Folk Narratives: The Fairy Tale Use Case. In: Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10)2010Conference paper (Refereed)
    Abstract [en]

    Propp’s influential structural analysis of fairy tales created a powerful schema for representing storylines in terms of character functions, which is directly exploitable for computational semantic analysis, and procedural generation of stories of this genre. We tackle two resources that draw on the Proppian model –, one formalizes it as a semantic markup scheme and the other as an ontology – both lacking linguistic phenomena explicitly represented in them. The need for integrating linguistic information into structured semantic resources is motivated by the emergence of suitable standards that facilitate this, and the benefits such joint representation would create for transdisciplinary research across Digital Humanities, Computational Linguistics, and Artificial Intelligence.

  • 13. Lendvai, Piroska
    et al.
    Declerck, Thierry
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Malec, Scott
    Propp Revisited: Integration of Linguistic Markup into Structured Content Descriptors of Tales2010In: Proceedings of the Conference for Digital Humanities 2010, 2010Conference paper (Refereed)
    Abstract [en]

    Metadata that serve as semantic markup, such as conceptual categories that describe the macrostructure of a plot in terms of actors and their mutual relationships, actions, and their ingredients annotated in folk narratives, are important additional resources of digital humanities research. Traditionally originating in structural analysis, in fairy tales they are called functions (Propp, 1968), whereas in myths – mythemes (Lévi-Strauss, 1955); a related, overarching type of content metadata is a folklore motif (Uther, 2004; Jason, 2000).In his influential study, Propp treated a corpus of tales in Afanas'ev's collection (Afanas'ev, 1945), establishing basic recurrent units of the plot ('functions'), such as Villainy, Liquidation of misfortune, Reward, or Test of Hero, and the combinations and sequences of elements employed to arrange them into moves.1 His aim was to describe the DNAlike structure of the magic tale sub-genre as a novel way to provide comparisons. As a start along the way to developing a story grammar, the Proppian model is relatively straightforward to formalize for computational semantic annotation, analysis, and generation of fairy tales. Our study describes an effort towards creating a comprehensive XML markup of fairy tales following Propp's functions, by an approach that integrates functional text annotation with grammatical markup in order to be used across text types, genres and languages. The Proppian fairy tale Markup Language (PftML) (Malec, 2001) is an annotation scheme that enables narrative function segmentation, based on hierarchically ordered textual content objects. We propose to extend PftML so that the scheme would additionally rely on linguistic information for the segmentation of texts into Proppian functions. Textual variation is an important phenomenon in folklore, it is thus beneficial to explicitly represent linguistic elements in computational resources that draw on this genre; current international initiatives also actively promote and aim to technically facilitate such integrated and standardized linguistic resources. We describe why and how explicit representation of grammatical phenomena in literary models can provide interdisciplinary benefits for the digital humanities research community. In two related fields of activities, we address the above as part of our ongoing activities in the CLARIN2 and AMICUS3 projects. CLARIN aims to contribute to humanities research by creating and recommending effective workflows using natural language processing tools and digital resources in scenarios where text-based research is conducted by humanities or social sciences scholars. AMICUS is interested in motif identification, in order to gain insight into higher-order correlations of functions and other content units in texts from the cultural heritage and scientific discourse domains. We expect significant synergies from their interaction with the PftML prototype.

  • 14. Paggio, Patrizia
    et al.
    Allwood, Jens
    University of Borås, School of Business and IT.
    Ahlsén, Elisabeth
    Jokinen, Kristiina
    The NOMCO Multimodal Nordic Resource: Goals and Characteristics2010In: In Proceedings of the Seventh conference on International Language resources and Evaluation (LREC'10), Valetta, Malta, May 19-21 / [ed] N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, D. Tapias, European Language Resources Association (ELRA) , 2010Conference paper (Refereed)
    Abstract [en]

    his paper presents the multimodal corpora that are being collected and annotated in the Nordic NOMCO project. The corpora will be used to study communicative phenomena such as feedback, turn management and sequencing. They already include video material for Swedish, Danish, Finnish and Estonian, and several social activities are represented. The data will make it possible to verify empirically how gestures (head movements, facial displays, hand gestures and body postures) and speech interact in all the three mentioned aspects of communication. The data are being annotated following the MUMIN annotation scheme, which provides attributes concerning the shape and the communicative functions of head movements, face expressions, body posture and hand gestures. After having described the corpora, the paper discusses how they will be used to study the way feedback is expressed in speech and gestures, and reports results from two pilot studies where we investigated the function of head gestures ― both single and repeated ― in combination with feedback expressions. The annotated corpora will be valuable sources for research on intercultural communication as well as for interaction in the individual languages.

  • 15. Szöts, Miklós
    et al.
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Alexin, Zoltán
    Vincze, Veronika
    Almási, Attila
    Semantic Processing of a Hungarian Ethnographic Corpus2010In: Proceedings of the 1st International AMICUS Workshop, October 21, 2010, Vienna, Austria, p. 112-115Article in journal (Refereed)
    Abstract [en]

    In this poster, a Hungarian ethnographic database containing linguistic annotation is presented. The corpus contains texts from three domains, namely, folk beliefs, t altos texts and tales. All the possible morphosyntactic analyses assigned to each word and the appropriate one selected from them (based on contextual information) are also marked. Syntactic (dependency) annotation is added semi-automatically to the corpus texts at a second phase of the processing. With the help of these enriched linguistic attributes, the texts can be semantically analyzed and clustered. The research and development team is working on a semantic search tool enabling to browse the texts on the basis of their semantic meaning. The proposed technology may result in a new approach to the ethnographic research and may open a new type of access to the databases.

  • 16.
    Wilhelmsson, Kenneth
    University of Borås, Swedish School of Library and Information Science.
    Automatic Question Generation from Swedish Documents as a Tool for Information Extraction2011In: Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011 / [ed] Bolette Sandford Pedersen, Gunta Nešpore, Inguna Skadiņa, 2011, p. 323-326Conference paper (Refereed)
    Abstract [en]

    An implementation of automatic question generation (QG) from raw Swedish text is presented. QG is here chosen as an alternative to natural query systems where any query can be posed and no indication is given of whether the current text database includes the information sought for. The program builds on parsing with grammatical functions from which corresponding questions are generated and it incorporates the article database of Swedish Wikipedia. The pilot system is meant to work with a text shown in the GUI and auto-completes user input to help find available questions. The act of question generation is here described together with early test results regarding the current produced questions.

  • 17.
    Wittek, Peter
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Introducing Scalable Quantum Approaches in Language Representation2011Conference paper (Refereed)
    Abstract [en]

    High-performance computational resources and distributed systems are crucial for the success of real-world language technology applications. The novel paradigm of general-purpose computing on graphics processors (GPGPU) o ers a feasible and economical alternative: it has already become a common phenomenon in scienti c computation, with many algorithms adapted to the new paradigm. However, applications in language technology do not readily adapt to this approach. Recent advances show the applicability of quantum metaphors in language representation, and many algorithms in quantum mechanics have already been adapted to GPGPU computing. SQUALAR aims to match quantum algorithms with heterogeneous computing to develop new formalisms of information representation for natural language processing in quantum environments.

  • 18.
    Wittek, Peter
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Spectral Composition of Semantic Spaces2011Conference paper (Refereed)
    Abstract [en]

    Spectral theory in mathematics is key to the success of as diverse application domains as quantum mechanics and latent semantic indexing, both relying on eigenvalue decomposition for the localization of their respective entities in observation space. This points at some implicit \energy" inherent in semantics and in need of quanti cation. We show how the structure of atomic emission spectra, and meaning in concept space, go back to the same compositional principle, plus propose a tentative solution for the computation of term, document and collection \energy" content.

  • 19.
    Wittek, Peter
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Dobreva, Milena
    Matching Evolving Hilbert Spaces and Language for Semantic Access to Digital Libraries2010In: The Role of Digital Libraries in a Time of Global Change. Proceedings of the 12th International Conference on Asia-Pacific Digital Libraries, ICADL 2010, Gold Coast, Australia, June 21-25, 2010. / [ed] Gobinda Chowdhury, Chris Koo, Jane Hunter, Springer , 2010, p. 262-263Conference paper (Other academic)
    Abstract [en]

    Extended by function (Hilbert) spaces, the 5S model of digital libraries (DL) [1] enables a physical interpretation of vectors and functions to keep track of the evolving semantics and usage context of the digital objects by support vector machines (SVM) for text categorization (TC). For this conceptual transition, three steps are necessary: (1) the application of the formal theory of DL to Lebesgue (function, L2) spaces; (2) considering semantic content as vectors in the physical sense (i.e. position and direction vectors) rather than as in linear algebra, thereby modelling word semantics as an evolving field underlying classifications of digital objects; (3) the replacement of vectors by functions in a new compact support basis function (CSBF) semantic kernel utilizing wavelets for TC by SVMs.

  • 20.
    Wittek, Peter
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Darányi, Sándor
    University of Borås, Swedish School of Library and Information Science.
    Tan, Chew Lim
    An Ordering of Terms Based on Semantic Relatedness2009In: Proceedings of IWCS-8, January 7-9, 2009, Tilburg, The Netherlands / [ed] H Bunt, V Petukhova, S Wubben, 2009, p. 235-247Conference paper (Refereed)
    Abstract [en]

    Term selection methods typically employ a statistical measure to filter or weight terms. Term expansion for IR may also depend on statistics, or use some other, non-metric method based on a lexical resource. At the same time, a wide range of semantic similarity measures have been developed to support natural language processing tasks such as word sense disambiguation. This paper combines the two approaches and proposes an algorithm that provides a semantic order of terms based on a semantic relatedness measure. This semantic order can be exploited by term weighting and term expansion methods.

  • 21.
    Wittek, Peter
    et al.
    University of Borås, Swedish School of Library and Information Science.
    Ravenek, Walter
    Supporting the Exploration of a Corpus of 17th-Century Scholarly Correspondences by Topic Modeling.2011Conference paper (Refereed)
    Abstract [en]

    This paper deals with the application of topic modeling to a corpus of 17th-century scholarly correspondences built up by the CKCC project. The topic modeling approaches considered are latent Dirichlet allocation (LDA), latent semantic analysis (LSA), and random indexing (RI). After describing the corpus and the topic modeling approaches, we present an experiment for the quantitative evaluation of the performance of the various topic modeling approaches in reproducing human-labeled words in a subset of the corpus. In our experiments random indexing shows the best performance, with scope for further improvement. Next we discuss the role of topic modeling in the CKCC Epistolarium, the virtual research environment that is being developed for exploring and analysing the CKCC corpus. The key feature of topic modeling is its ability to calculate similarities between words and texts. In an example we illustrate how such an approach may yield results that transcend a regular text search.

1 - 21 of 21
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • harvard-cite-them-right
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf