Of Lions and Yakshis: Ontology-based Narrative Structure Modelling for Culturally Diverse Folktales

Tracking #: 2360-3573

Franziska Pannach
Caroline Sporleder
Wolfgang May
Aravind Krishnan

Responsible editor: 
Special Issue Cultural Heritage 2019

Submission type: 
Full Paper
Vladimir Propp's theory Morphology of the Folktale identifies 31 invariant functions, subfunctions, and seven classes of folktale characters to describe the narrative structure of the Russian magic tale. Since it was first published in 1928, Propp's approach has been used on various folktales of different cultural backgrounds. We built an ontology that models Propp's theory by implementing narrative functions as classes and relations. A special focus lies on the restrictions Propp defined regarding which Dramatis Personae fulfill a certain function. We investigated how an ontology can assist traditional humanities research in examining how well Propp's theory fits for folktales outside of the Russian-European folktale culture. For this purpose, a light-weight query system was implemented using an Apache Jena Fuseki backend. In order to allow ontology browsing, we provide an institutional Webprotégé instance. To determine how well both the annotation schema and the query system works, we annotated twenty African tales, and fifteen tales from India. We evaluate the system by examining two case studies regarding the representation of characters and the use of Proppian functions in African and Indian tales. Our findings are in line with traditional analogous humanities research. This project shows how carefully modelled ontologies can represent and re-evaluate traditional theories of literary scholars, and how they can be utilized as a knowledge-base for comparative folklore research.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Albert Meroño-Peñuela submitted on 20/Dec/2019
Major Revision
Review Comment:

In this paper, the authors investigate the use of formal ontologies to assist traditional humanities scholars in examining corpora of folktales. Specifically, they focus on the theoretical work of Vladimir Propp, “Morphology of the Folktale”, which identifies 31 functions and character classes to describe the narrative structure of Russian tales, and proposes a symbolic system representing them. Applying this symbolic system to analyze actual tales is a manual and subjective task that has led scholars to intense, and often contradictory, discussions; the authors propose a formalization of this symbolic system as an OWL ontology to mitigate this. They evaluate the ontology in a full stack Semantic Web system that uses the ontology to annotate passages of 20 African and Indian folktales, finding function and character patterns for their narratives.

The paper is very well written, and is relevant for the special issue on cultural heritage as it touches upon many topics of the call. The work has an inherent interest for scholars of cultural heritage and digital humanities, since it describes the full spectrum of activities (requirements, knowledge engineering, implementation, querying, interface design, etc.) that goes from the status quo (i.e. traditional offline scholar discussion) to a machine scalable process. Section 2 is of high quality, as clearly conveys the background knowledge that any reader needs to understand the problem, but also introduce a role for formal ontologies in the symbolic system of plots, functions and characters that clearly makes sense. I think the authors put a very valuable effort into making their work a bottom-up process from a particular case (the specific theory of one intellectual) to an empirical evaluation about how good such a theory can generalize (by e.g. using its formalization in tales from distinct cultures -Africa, India- than the one the theory arose from). So in this sense, the paper is not just an off-the-shelf application of SW technology, but seeks a higher scientific pursuit.

However, I think that it is in this scientific pursuit where the paper has a number of important issues that need to be addressed before it is suitable for publication:

- Relation with respect to the state of the art. A central question I had while reading the paper is the argumentation around the alternative existing ontologies that model Proppain functions [13], [15]. For example, one of the arguments of the authors as limitations of [13] is that “they made some design choices we could not reuse for our approach, e.g. the ontology class Move is a subclass of ProppFunction. In our case, Move is a subclass of Tale, because a tale consists of one or multiple moves, i.e. story lines.”; but subclass relations (i.e. rdfs:subClassOf) are not part-of relations, but rather subset relations (e.g. in the former all moves are also ProppFunctions, which is true; but in the latter all moves are tales, which is false). This is a common mistake in subclass relationships and authors do show to be aware of the differences, as shown e.g. in Fig. 2; but an aware reader cannot be satisfied with this explanation as an argument for not reusing [13]. Moreover, I thought this signals a larger issue of how this work relates to existing ontologies on the topic; what the limitations of those are, and how this work addresses them at the time it reuses other relevant, more adequate parts. I think this could improve the paper a great deal, in making very explicit and more tangible arguments on what the novel aspect of this new ontology is and why do we need it besides [13] and [15]. If, as a result of this, there are more extensive parts of existing ontologies that can be reused, I would suggest to phrase the contribution of this paper more as an ontology design pattern (see e.g. http://ontologydesignpatterns.org/wiki/Main_Page) and to clearly specify which classes/properties of which other ontologies were reused (by e.g. making their namespaces explicit in Listigns 1,2 and Figs. 1-4), which ones were necessary to build from scratch, and how they relate to each other. Secondly, I suggest to broaden a bit the related work section so other relevant work in SW for cultural heritage (e.g. Xu Lei, Albert Meroño-Peñuela, Huang Zhisheng, Frank van Harmelen. “An Ontology Model for Narrative Image Annotation in the Field of Cultural Heritage”. In: Proceedings of the 2nd Workshop on Humanities in the SEmantic web (WHiSe 2017). ISWC 2017, October 22nd, Vienna, Austria (2017)) or quantitative analyses on folk tales (e.g. Theo Meder, Folgert Karsdorp, Dong Nguyen, Mariët Theune, Dolf Trieschnigg, and Iwe Muiser (2016) Automatic Enrichment and Classification of Folktales in the Dutch Folktale Database).

- Interoperability. A second, more general question I had is about the specific motivation for choosing OWL ontologies and SW technology to address this work. Why could not the same result be achieved by using other (plausible) alternatives? I can see the value of formalizing the aspects described in Section 2, but how does this contribute to solve the general problem of making the various disconnected cultural heritage databases mutually interoperable, as mentioned in the special issue call? The explicit reuse of existing classes and properties in other vocabularies (see my previous point), or the explicit linking to resources of external datasets, could greatly help in answering these questions.

- Evaluation. Being aware that evaluating this type of work is very difficult, I felt somewhat unconvinced by the evidence provided in the evaluation, and the extent to which this evidence answers the research question (i.e. “how formalized ontologies can help assessing differences between intercultural folktales”). A comparison of the African and Indian tale results with the results of applying the same ontology to, say, Russian tales (given that is the context in which the theory was proposed) could help understand better if these differences are stationary or whether there are global patterns independent of the specific cultures. Tables 2 and 3 are helpful regarding this, but also a bit confusing: why not show the function sequences as shown in e.g. Section 2, instead of summary statistics? Similarly, it is hard for the reader to assess whether these patterns are accurate or complete without a more thorough discussion and comparison with alternatives. One option could be to compare these resulting function sequences with established knowledge and theories about plots, e.g. to those of the theoreticians of Section 2.

- Other comments:
- As part of the future work, could this be framed in the context of achieving automatic annotations by using machine learning? Or towards a benchmark for a motif detection task in NLP, provided larger samples of ground truth (i.e. more than 20) could be provided?
- The header of Section 1.1 is a bit redundant and could be removed
- Before 2.2, it would be useful to have a table of all Propp functions as a reference to understand the later examples
- Table 1: concepts can be summarized with union operators to avoid repeating them and improve readability (it will also make more visual which classes present higher complexity for reasoning). The “False Hero” concept should be fixed; it must be something else than “not Hero”, because a Reward could also be a False Hero. I would suggest to check the outcomes of the retrieval task to make sure all classes are used in the annotations, and are consistent with the instances of text found. This table also suggests that a brief discussion of the DL expressivity needed (e.g. just ALC, need of cardinality restrictions, etc.) would give important clues about the complexity of the domain.
- Annotations: could you be more specific on what is being used as annotations? E.g. I gather text is annotated with ontology classes, but is there also an annotation model involved to reuse these annotations (e.g. Open Annotation Data Model)?
- Implementation details: most of them could be removed simply providing a link to the code, it this is open source. For the API construction, I would advice for something like grlc (http://grlc.io) if your SPARQL queries do not need any postprocessing (disclaimer: I’m the author of this one)
- Section 7 feels a bit detached from the general narrative of the paper. Maybe a diagram of how the entity recognition task fits in the general strategy would help (instead of e.g. Fig. 5 which is too focused on implementation details)

Review #2
Anonymous submitted on 23/Dec/2019
Major Revision
Review Comment:

The paper presents an ontology for representing folktales based on Vladimir's Propp theory and an application of the ontology to the description of traditional tales from Sub-Saharan Africa and the Indian state of Kerala. The authors used the ontology to build an information extraction system for Propp functions in folktales. Then, they populated the ontology and performed an analysis of the resulting knowledge base, drawing some interesting conclusions about the narrative structure of such tales.
The authors' study is relevant to the current special issue. I found the results interesting, and I appreciated the authors' focus on non-Western tales since these kinds of texts are often overlooked in Western-centric Digital Humanities.
However, some major issues need to be resolved. Indeed, in my opinion, the authors should provide a better introduction to the problem they aim to solve, and more detailed justifications of their methodology and modelling choices. The related works section should be expanded. Some sections of the paper are not so clear, and in some cases disconnected. The implementation section contains some many technical details that are not so important for the paper. Finally, the conclusions could be restructured to give a better overview of the work done by the authors.
A significant issue that hindered my evaluation of the paper is the fact that the website of the ontology is currently offline, making it impossible to access the ontology and the web application. I kindly ask the authors to restore the website.
In the following, I report the issues I identified and suggest some edits and corrections that may improve the paper.

== Introduction ==

1. The name of the ontology is not stated anywhere. The first mention of "ProppOntology" is in the Related Works section on page 4. The introduction refers to "this project", but the name of the project is not stated anywhere.
2. The introduction lacks a detailed problem statement. I suggest that the authors move to the introduction the first two paragraphs of section 2.3, which explain more clearly what is the problem, why they are working on it and how they plan to solve it.
3. It would also be helpful to specify in the introduction what language the ontology is written in, e.g. RDFS or OWL, providing an appropriate reference and explaining the acronym (e.g. "Web Ontology Language (OWL) [99]")
4. How should the expressive power of an ontology "grow with the number of annotations"? The expressive power of an ontology generally depends on the classes, properties, and axioms that it contains, not on the number of individuals that populate it.

== Description of the Domain ==

1. At the end of section 2.2, the discussion of the tale is long and specific. If the authors want to use this tale as an example, to facilitate the understanding of the reader, it should be better if they provided at least a short summary of the storyline.
2. In the last sentence, the authors state "nor is there only one correct sequence of functions per tale". I agree with this view, but in my opinion, the example presented in section 2.2 is not enough to prove this statement true. It would be helpful if the authors added a reference to a scholarly work discussing this issue.
3. At the end of page 3, the authors discuss implementation details such as vocabularies and annotation properties. This seems out of place and should be moved to section 5. Furthermore, the references for RDFS and OWL Annotation Properties should be provided.

== Related Works ==

1. This section could be extended and improved describing other ontology-based approaches to narrative representation.
2. The authors state that in their ontology the class Move is a subclass of Tale "because a tale consists of one or multiple moves" but in general a part-of relation is very different from a subclass relation. If Move is a subclass of Tale, then it can be inferred that each Move is a Tale. Is this what the authors intend?
3. Based on the provided references, it appears that the ProppOnto ontology was authored by Peinado [13], not by Declerck [15] as stated by the authors (also in section 5.4). In fact, reference [15] does not mention ProppOnto at all. Could the authors check? Furthermore, the authors should make it clear that ProppOnto and ProppOntology are two different ontologies because given the similarity of the names it is very easy to get confused.
4. The sentence "the Internationalized Resource Identifier (IRI) is a short description of the function according to the corresponding literal" is not clear for me because: (i) an IRI is simply an identifier, not a description, and (ii) "corresponding literal" is very vague: which literal? corresponding to what?
6. To better explain the authors’ methodological choices, it would be better to justify why the authors chose to import the Family Ontology by Koleva [18], instead of other more common genealogical ontologies.
7. When discussing the work by Koleva [18], the authors state that Koleva "used SWRL rules for the classes" but, generally, SWRL rules are applied on variables that represent individuals. Could the authors better explain this point? Furthermore, the authors state that Koleva's approach works "comparatively well", but what is the term of comparison?
8. When introducing the term "verbalisation", the authors should provide a definition of it.

== Design ==

1. I suggest the authors to better describe and add motivations for adopting an ontology-based approach, indeed the authors don't mention the main advantages of using ontologies, e.g. standardization and interoperability.
2. When the authors mention "RDF model", do they mean "OWL model"? Indeed, RDF does not have a concept of class (first introduced in RDFS), nor a distinction between object property and data property (first introduced in OWL).
3. The idea of representing each Proppian function as both a class and a property seems strange to me. Generally, a class and a property are two different things and a resource cannot be both a class and a property. In my opinion, to motivate and clarify their approach, the authors should answer the following questions: (i) are the "function class" and the "function property" two different resources with different IRIs? (ii) are they connected to each other, and if so how? (iii) why not simply connect each character to the function it appears in, instead of defining "function properties"? (iv) why not use reification to achieve the same goal?
4. The authors state that "in real life the classes of humans and animals would certainly be distinct". However, according to biology, Human is a subclass of Animal, therefore they cannot be distinct.
5. In Figure 1, the names of object properties begin with an uppercase character. This is in contrast with standard practice in the Semantic Web field, which is to use a lowercase character.
6. The description of object properties and data properties is very short and it should be expanded, providing a list of the main properties of the ontologies if possible.
7. At the end of page 8, the sentence "the function Reconnaissance ϵ1 applies" is unclear (at least without looking at the ontology).
8. Also at the end of page 8, the sentence "since they have been included before the Family Ontology was imported" is not so clear for me: is there a reason for not doing it now?
9. On page 9, when discussing the work of the annotators, it would be helpful to state who are these annotators (are they experts in the field or not?). Furthermore, was each tale annotated by only one annotator, or more than one?

== Implementation ==

1. In general, the implementation section contains technical details that can be considered not relevant in this context, e.g. about the security of the platform, which can be removed.
2. "most modern web applications are developed using programming languages like PHP or Ruby" -> please provide a reference.
3. The authors should provide a reference, or at least a footnote, when introducing the Apache Jena Fuseki software.
4. "the Webprotégé instance is not directly connected..." -> how is the Webprotégé ontology synchronized with the ontology stored in Fuseki?
5. In Section 6.1, the authors state that there are three means to query the ontology, but they describe only two. What is the third?
6. "would be represented by the placeholders" -> "would be represented by variables"
7. What do the authors mean by "most recent relations"?

== Information Extraction from Tale Texts ==

1. "They trained" -> they who? The authors of the module?
2. The last two sentences of section 7.1 are generic. The quality of the paper could be improved if the authors better quantify the advantages and disadvantages of their approach.

== Conclusions ==

1. I suggest restructuring the section in order to provide a better description of what has been done, e.g. starting with "In this paper we have presented ProppOntology, an ontology for modelling folktales based on Vladimir Propp's theory Morphology of the Folktale" and then briefly describing the steps followed by the authors and the main results that they achieved

== Spelling & Grammar ==

I suggest that the authors thoroughly check the spelling and grammar of the paper, including the following:
• page 1, column 1, line 40: in the beginning -> at the beginning
• page 1, column 2, line 47: help assessing -> help assess
• page 2, column 1, line 27: ist structured -> is structured
• page 2, column 1, line 27: subsequent -> following
• page 2, column 1, line 38: conclusion -> conclusions
• page 2, column 2, line 8: add comma after "furthermore"
• page 2, column 2, line 37: intial -> initial
• page 2, column 2, line 40: figur -> figure
• page 3, column 2, line 8: remove comma before "fights"
• page 3, column 2, line 40: neccessary -> necessary
• page 3, column 2, line 41: opinion which -> opinion on which
• page 3, column 2, line 41: analyses analyses -> analyses
• page 3, column 2, line 43: vocabulary -> vocabularies
• page 3, column 2, line 43: like -> such as
• page 3, column 2, line 45: like -> such as
• page 4, column 2, line 11: different -> differently
• page 4, column 2, line 29: rdf:comments -> rdfs:comments
• page 4, column 2, line 37: Types -> types
• page 5, column 2, line 16: prevalant -> prevalent
• page 5, column 2, line 21: like -> such as
• page 6, column 1, line 19: extration -> extraction
• page 6, column 2, line 1: with folktales -> of folktales
• page 6, column 2, line 11: Following -> Following the
• page 6, column 2, line 36: this order -> and this order
• page 8, column 2, line 46: invididuals -> individuals
• page 9, column 1, line 41: respectively -> and respectively
• page 10, column 2, line 6: productive -> production
• page 10, column 2, line 40: fusekis sparql -> Fuseki's SPARQL
• page 11, column 1, line 39: checkbox behind -> checkbox beside
• page 13, column 1, line 6: gramatically -> grammatically
• page 13, column 1, line 15: Therfore -> Therefore
• page 13, column 1, line 35: approach method -> method
• page 13, column 2, line 12: occurences -> occurrences
• page 14, column 1, line 20: add comma after "Instead"
• page 14, column 1, line 21: webinterface -> web interface
• page 14, column 1, line 23: front end -> frontend
• page 17, column 1, line 49: audiencce -> audience
• page 17, column 2, line 3: javascript -> JavaScript
• page 17, column 2, line 24: humanities -> Humanities
• page 17, column 2, line 40: webprotégé -> Webprotégé

Review #3
By Kalliopi Kontiza submitted on 22/Jan/2020
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper "Of Lions and Yakshis" presents the design, implementation and evaluation of an ontology that models Propp's narrative theory and by following his morphological approach, represents characters and functions of folktales. A corpus of African and Indian tales was annotated and the results of the lightweight query system were evaluated. The aim was to showcase how this modelling approach can help the domain experts to conduct comparative analysis of multi-cultural folktales in terms of their narrative structure. This type of comparative and contextualised analysis is performed manually by scholars and experts of the domain. The critical and informative related work section, where the paper compares similar modelling aspects and design approaches highlights for the reader the contribution of the paper to the interested research community.

(1) Originality

The paper states that the findings of the described approach are confirmatory of well-known outcomes in traditional analogous humanities research. The novelty of the paper lies (i) on the creation of an online system that can allow comparative Proppian analyses of for a given set of tales. (ii) on the support of the folktale researchers community to perform the study of Proppian morphology intercultural and language-independently. (iii) on some of the design aspects related to this approach. There is a need to communicate the novelty aspects of this approach more clearly. For example: Although subsections 2.1 and 2.2 introduce the reader to the application domain as mentioned in the outline, subsection 2.3 discusses elements related to the method and the approach of the system to highlight the system's novelty and contribution. The content of this section would have better chances to stand out if grouped with other novelty related content scattered in the paper.

(2) Significance of results

While I very much enjoyed the narrative of the paper, I think the results and evaluation section could be stronger. The paper mentions that the section's focus is on "reporting the verifiable results, leaving deeper interpretations of our findings to the interested folkoristically educated scholar". The evaluation of the ontology would benefit immensely with the involvement of experts outside the project team. Although the competency questions seem to be robust for the purpose of developing the ontology, a team of experts could potentially help in evaluating to which extent this approach - "a carefully modelled ontology"- can help a domain expert to assess "the differences between intercultural folktales with regard to their narrative structure" for further interpretations in conducting a comparative analysis. Therefore, it would be very helpful to see either an evaluation performed as an inspection by the domain expert or a more critical discussion of the results of the evaluation applied in context of the domain, by gathering all information together into a single whole, evaluate the trends observed and explain the significance of the results to wider understanding by referencing published research.

(3) Quality of writing

- Subsection 8.1.1 is titled as Representation of Characters in African Tales, although it presents results from both African and Indian corpora.
- The addition of the sentence "There was no instance of either Princess/Princess' Father or Seeker" in the body of the text is needed to aid understanding of the information presented at the pie graph, showing the Distribution of Dramatis Personae in Indian tales, in Fig.8 since this information cannot be extracted visually from the pie chart (the lack of numbers which are not displayed at the pie chart does not compensate for the missing information).
- Subsection 8.1.2 the second paragraph of this section is difficult for the reader to follow. Partially this is due to the mistaken in text reference to Table 3 (p.15, line 49) instead of Table 2.
- Subsection 9.3, the first sentence needs rephrasing or more punctuation to work.
- Since this is a long manuscript by putting the writers in the reader's place, the use of inter-sections references can benefit the paper readability. For example Section 5, the last paragraph before the end "to demonstrate how a thoroughly modelled ontology in combination with....on the project website" could benefit by having explicit mentions of the manuscript sections where the work mentioned is further described (such as 6. Implementation and 7. Information Extraction from Tale Texts.)

Other Minor issues
- Figure 11 caption does not include "(multiple occurrences possible)". Is this not the same case here as Figure 10?
- Figures 3 and 4 blank ink for displaying the labels of the graph nodes is difficult to read on a coloured background, especially for the graph nodes using a blue background.
- The sentence "a sequence of functions represents the plot of a tale and is encoded in a string of function literals" is repeated in p2 line 18-19 and then 45-47 without adding any more information to the reader.