Metadata that serve as semantic markup, such
as conceptual categories that describe the
macrostructure of a plot in terms of actors
and their mutual relationships, actions, and
their ingredients annotated in folk narratives,
are important additional resources of digital
humanities research. Traditionally originating
in structural analysis, in fairy tales they are
called functions (Propp, 1968), whereas in
myths – mythemes (Lévi-Strauss, 1955); a
related, overarching type of content metadata is
a folklore motif (Uther, 2004; Jason, 2000).In his influential study, Propp treated a corpus
of tales in Afanas'ev's collection (Afanas'ev,
1945), establishing basic recurrent units of the
plot ('functions'), such as Villainy, Liquidation
of misfortune, Reward, or Test of Hero,
and the combinations and sequences of
elements employed to arrange them into
moves.1 His aim was to describe the DNAlike
structure of the magic tale sub-genre as
a novel way to provide comparisons. As a
start along the way to developing a story
grammar, the Proppian model is relatively straightforward to formalize for computational
semantic annotation, analysis, and generation
of fairy tales. Our study describes an effort
towards creating a comprehensive XML markup
of fairy tales following Propp's functions, by
an approach that integrates functional text
annotation with grammatical markup in order to
be used across text types, genres and languages.
The Proppian fairy tale Markup Language
(PftML) (Malec, 2001) is an annotation scheme
that enables narrative function segmentation,
based on hierarchically ordered textual content
objects. We propose to extend PftML so
that the scheme would additionally rely on
linguistic information for the segmentation
of texts into Proppian functions. Textual
variation is an important phenomenon in
folklore, it is thus beneficial to explicitly
represent linguistic elements in computational
resources that draw on this genre; current
international initiatives also actively promote
and aim to technically facilitate such integrated
and standardized linguistic resources. We
describe why and how explicit representation of
grammatical phenomena in literary models can
provide interdisciplinary benefits for the digital
humanities research community.
In two related fields of activities, we address
the above as part of our ongoing activities in
the CLARIN2 and AMICUS3 projects. CLARIN
aims to contribute to humanities research by
creating and recommending effective workflows
using natural language processing tools and
digital resources in scenarios where text-based
research is conducted by humanities or social
sciences scholars. AMICUS is interested in motif
identification, in order to gain insight into
higher-order correlations of functions and other
content units in texts from the cultural heritage
and scientific discourse domains. We expect
significant synergies from their interaction with
the PftML prototype.