Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Construction and annotation of a corpus of contemporary Nepali
University of Borås, School of Business and IT.
Show others and affiliations
2008 (English)In: Corpora, ISSN 1749-5032, E-ISSN 1755-1676, Vol. 3, no 2, p. 213-225Article in journal (Refereed) Published
Abstract [en]

In this paper, we describe the construction of the 14-million-word Nepali National Corpus (NNC). This corpus includes both spoken and written data, the latter incorporating a Nepali match for FLOB and a broader collection of text. Additional resources within the NNC include parallel data (English–Nepali and Nepali–English) and a speech corpus. The NNC is encoded as Unicode text and marked up in CES-compatible XML. The whole corpus is also annotated with part-of-speech tags. We describe the process of devising a tagset and retraining tagger software for the Nepali language, for which there were no existing corpus resources. Finally, we explore some present and future applications of the corpus, including lexicography, NLP, and grammatical research.

Place, publisher, year, edition, pages
Edinburgh University Press , 2008. Vol. 3, no 2, p. 213-225
Keywords [en]
corpus linguistics, nepali, spoken language, Corpus linguistics, linguistic resources
National Category
Specific Languages Computer and Information Sciences
Identifiers
URN: urn:nbn:se:hb:diva-2507DOI: 10.3366/E1749503208000166Local ID: 2320/4365OAI: oai:DiVA.org:hb-2507DiVA, id: diva2:870601
Available from: 2015-11-13 Created: 2015-11-13 Last updated: 2018-01-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Allwood, Jens

Search in DiVA

By author/editor
Allwood, Jens
By organisation
School of Business and IT
In the same journal
Corpora
Specific LanguagesComputer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 303 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf