Using an Ontology for Representing the Knowledge on Literary Texts: the Dante Alighieri Case Study

Valentina Bartalesi
Carlo Meghini

Eero Hyvonen

Ontology Description
This paper describes a digital library developed within the "Towards a Digital Dante Encyclopaedia" project, a three years Italian National Research Project that aims at building services supporting scholars in creating, evolving and consulting a digital encyclopaedia of Dante Alighieri's works. The digital library is based on a knowledge base storing knowledge on the primary sources that Dante refers to in his works. At present, this information is scattered on many paper books, making it difficult to systematically overview the culture background of Dante and to obtain a well-founded perception of how this background was gradually set up in time. The same applies also to other authors, therefore the applicability of our work extends well beyond the specific author we are considering in our project. The digital library that we are building is based on an ontology for representing the knowledge on one author's works and on the primary sources embedded in the commentaries to these works. Following this approach, a semantic network of Dante's works and of references to primary sources of these works was created. Furthermore, a web application allowing users to explore the semantic network in various ways and to visualize statistical information about the references as charts and tables was developed.
Review #1
By Oyvind Eide submitted on 07/Jun/2015
This is a review based on the following criteria, as requested:

1) Quality and relevance of the described ontology (convincing evidence must be provided)

2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology

General comments:

The paper presents an ontology for textual studies with a focus on the works of Dante. While this is not the only way to handle the issues connected to annotations, it is a nice approach and it will be interesting to follow the development of their work in the future.

Comments to relevance of the ontology:

The introduction points out the aim of the paper claiming indicating that it is possible and/or useful to develop one all-encompassing ontology for textual studies. They state that “Unfortunately, there is no one single ontology representing all aspects that are relevant to study a literary author’s work” and that the project support scholars in “formally expressing the knowledge present in Dante Alighieri’s works and more in general in literary texts.” This gives an impression that it is possible to make one system expressing all relevant information in literary texts. If the authors claim this it would be good to see a clearer argument why they think this is possible. If not, it would be good to clarify that ontologies such at the one they develop always will represent one (set of) possible interpretation(s) of the source text(s).

It is clear that the specific ontology development as it is presented in this paper is based on specific Dante scholars’ input, and is thus local to specific research questions. The question I ask here is if they believe it to be possible IN PRINCIPLE to make one all-encompassing ontology for literary studies.

I find the use of the CIDOC-CRM property P3 has note for the link between the commentary and the text a bit puzzling. The first sentence of the scope note says “This property is a container for all informal descriptions about an object that have not been expressed in terms of CRM constructs.” (ver. 6.0). The property is explicitly meant to be a link from a CRM based formalism to an informal description, but in this case it is used as a link from an non-CRM formalism. Further, in the next sentence the scope note goes on to say that “In particular it captures the characterisation of the item itself, its internal structures, appearance etc.” I do not see that the annotations do characterise the item being annotated in this way.

It would be good to see a clearer argument why P3 has note is chosen based on the scope notes in CIDOC-CRM.

The use of the FRBRoo classes seems in general to be sound to me. A possible issue with F23 Expression Fragment is that it is defined as “parts of Expressions and these parts are not Self-Contained Expressions themselves.” Thus, if a fragment of a work the commentary refers to happens to be natural unit of the expression and thus a F22 Self-Contained Expression. However, I see that this may be a hypothetical situation which may not occur i the work at hand. It would still be interesting to see a short discussion of their choice of F23 Expression Fragment and not F22 Self-Contained Expression in all cases.

The other ontologies are less known to me and while I see no clear problems with the use of them I am not able to discuss the use of them in depth.

A last question is why TEI is not mentioned at all. While not an ontology and not directly relevant to their work, it is still an important standard for the encoding of literary texts which do include parts that function more or less as ontologies, such as the bibliographical structures in the header. It may be interesting to see TEI structures as inspiration for their work, and also to see how their classes relate to TEI elements. In a generalisation of the use of the ontology and the web application it would also be interesting to look into import from TEI documents to populate the system.

Comments to the readability:

The text seems readable and clear to me as a non-native English reader.

Review #2
By Arianna Betti submitted on 23/Jul/2015
This is an informative, thoroughly enjoyable, interesting, well-structured and very relevant paper!

The project described concerns a milestone of Italian (and European) mediaeval cultural heritage - it is so nice to see there is an Encyclopedia in the making on the works of a man who was in his time an encyclopedia himself, as Dante was, back then, arguably the most knowledgeable person alive. Also, I liked the clean design of the web application and the possibility it offers of downloading a csv file of the data.

The project is exciting, and as far as I can see the choices made as regards the representation of knowledge contained in Dante's works and scholarly commentaries on it, are appropriate (some issues should be addressed in the revised version as regards in particular the thematic areas as relying on the Nuovo Soggettario.) I particularly appreciated section 3 on reused ontologies, and the use the author have made of FRBR-related work such as the Nuovo Soggettario. I’d be thrilled to test this model on my own field of research. I’ve learned really a lot.

My overall impression is positive and if I have chosen a rather low 0-100 score for this version of the paper is for two reasons (1) certain rather easily fixable shortcomings in the writing, especially as regards clarity and accessibility, and (2) this is an article that has an incredible potential to be interesting for a far, far larger audience: there is space for improvement to cater for that audience and it would be a pity not to try to reach it.

I am not sure whether the word limit would allow the authors to make the improvements I recommend, but most can be dealt with in a few words. In any case, the editor should keep in mind that these comments are no reason to reject the paper!

Also, please note that my background is in the humanities and my comments should be read with this background in mind.

I’d like the authors to include replies to the following questions in the revised version.

1. In what does this paper differ and in particular improves on [1], exactly?

2. Who are the Dante scholars who provided the different commentaries, and in particular, who are the people who worked on the spreadsheets, and collaborated with you? See eg. p3, here you are sloppy and very unclear. It is extremely important to a humanities audience to convey this information precisely. For instance: are only *existing* and publicly accessible commentaries used here and only transformed (structured) by / with the help of (a) local Dante scholar(s) on the project, or are *new* commentaries produced? And if the latter, is this information retrievable for a use? This is key information, and I cannot stress enough how important careful information on this point is for your audience.

3. Is there any similar project going on right now (on other authors obviously), to the best of your knowledge? (suppose I want to apply your model to a new author: why would I choose yours?)

4. Please add information on who exactly, and how did your Dante experts came to the choice of the thematic areas. Eg were the *computational experts* on the team who suggested to use Nuovo Soggettario, or was this an initiative of the computational experts? Please add this information, and also add details to justify this choice. I can see it makes sense, but for future work, especially for re-use of the ontology you propose it is absolutely crucial to add a thoughtful methodological reflection on this point. Consider the following opportunity for further work. The Stanford Encyclopedia of Philosophy has an entry for Dante Alighieri: As a scholar, I might want to use your application to test some of the claims in this authoritative entry for soundness with data. But arguably, the ‘thematic areas’ chosen in your project are still a bit rough for such an aim - or so one may think. But now: Is this thinking (what I just said) really right? In other words, what was your aim with adding thematic areas to the knowledge represented exactly, and what are the limitations of your approach from the point of view of the scholarly content captured and offered through a resource such as the Nuovo Soggettario? (do not get me wrong: I find your choice really really interesting, but as a reader I need to know what i am getting and why, especially if I am a reader from another country than Italy!) The reason why this part of the project struck me is because adding such thematic information is another matter entirely, so to speak, than reconstructing a web of citations and referencing that is highly implicit in a medieval work. So telling what the aim is helps a user not to ask for what your project is not supposed to do (and a reader not to ask for a different paper….)

5. The word ‘commentary’ is ambiguous. If you said ‘modern commentaries’, it would already be helpful. Otherwise a reader might think you are talking about works such as Proclus’ Commentary on Plato’s Timaeus, a work which one might, strictly speaking, consider a commentary as a secondary source on Plato’s work a primary but that is normally considered to be a primary source for Proclus’ own ideas.

6. Please make clear what you mean by ’primary source’ in every context you use this term. I was very confused by the Introduction. I’d advise the authors to have a humanities scholar familiar with the project or at least with the paper to go over the whole text or at least through the introduction (*and* the first three sentences of section 2!) carefully to make it clearer for a humanities audience (and also for non-humanities audiences: I know for a fact that even the difference between ‘primary’ and ‘secondary’ sources cannot be considered common knowledge in many countries outside text-based humanities fields.) The example on column 2, p2 is great. Please add more (or find a way to embed them), especially when you describe the models in section 4.

7. p2, column 1: clarify the relation between terms, concepts, classes and representation (it sounds eg. odd to me to say that one adds classes to represent terms, it should be rather the opposite.)

8. You should spend two-three words to explain what Convivio is.

9. You should explain (very briefly!) why commentaries to primary works are necessary at all - that is, you cannot take for granted that everybody knows that the knowledge contained in such old texts is often implicit, and commentaries help readers who do not have that knowledge themselves understand what the text is saying. Footnotes are a very modern technique that Dante did not use…nor are referencing techniques to be expected in such old texts. Please tell your readers this.

10. What do you mean on p3 by ‘We verified that Convivio knowledge structure ….provided by the scholars.” I do not follow this sentence and I do not understand what you did exactly. (you should write ‘the Convivio’s knowledge structure’, arguably)

11. Similarly, the sentence just before is not comprehensible to me, what do you mean, are there ‘other’ scholars than whom? How many and whom are the scholars you involved? How many provides excel sheets? Did some *not* provide excel sheets? Sorry, these parts are frustrating to read - a missed opportunity to inform, one feels.

12. ‘The Digital Library field’. I do not understand. What do you mean?

13. Sometimes you say you re-used ‘many’ classes ad properties from existing ontologies, sometimes ‘some’. Make up your mind and be consistent, please. Even better (much better): give percentages of overlap with other ontologies, so you can be objective (and maximally informative!)

14. Did you use the FRBR ‘Manifestation’ class, if not why? This is a bit of a rhetorical question of mine, but the answer is important, especially to scholars who research the applicability of FRBR. For instance, the Australian Trove implements a FRBR-like structure in three levels, not the four levels of FRBR.

15. I would not say that FRBR allows ‘users to perform ….databases’ because that is done also without FRBR, so that is a wrong definition. So what is that FRBR *really* enables?

16. Your readers would arguably much appreciate a more critical approach to your ontologies review, as several of these ontologies seem to aim at exactly the same, but as your work seems to reveal, they fail to respond to certain very natural scholarly needs: *you* want to use them to do something that scholars really profit from, and need to combine several ontologies to be able to respond to those nees and to do what you want to do, and what you want to do for Dante scholars seems to me a very natural thing., as said. So that no ontology serves the purpose though actually some of them should, seems like a pity, right Your paper could be helpful to this discussion on standards in these fields, if you add a bit of your reflection on these issues. Here question 13 is very relevant, for if you re-used, say 10% of all other ontologies and OA for, say 90%, then these other ontologies do not fail so much, if you get my point. But if you have to draw from *all* your ontologies for 15%-20% each, then the problem seems serious, right?

17. p4 column2: ‘in particular, we considered FRBRoo..textual knowledge’ this passage is vague. Please specify ‘some aspects’, otherwise the sentence as it stands applies to all of the ontologies you reviewed.

18. p4 column 2: the explanation provided for 3. efrbroo:ExpressionFragment is vague and unclear, please rephrase.

19. p5 column 2: you say that you use 2. fabio:Poem to describe the structure of the resource, but in figure 4 this class is not included. Actually, this is not surprising as this class does not seem to regard structure. Please reconsider. Either rephrase or insert the class in the figure, better if you eg take the Commedia as example, with the nice canto/cantica relation.

20: fig5: why is cdm:hasNote relating :ef1 to :body1 instead of to :anno1?

21: p6 column2 It is unclear. Why you would call/use ‘hasCitingFragment’ for the relation of a commentary to its own fragments?

22: How was exactly the extraction of the thematic area guided, in 5. Ontology Population? See also 4. above.

English / writing

- p’2 persisted’ seems wrongly used
- p’2 Al la manna’ this is quite a bad typo, it seems - I guess something is going wrong with quotation marks? See also p5 column 1 after ‘Qnames without prefixes’.
- column 2, p4: ‘The’ is repeated
- Metaphisics -> Metaphysics
- p7 the our tool -> our tool :)

Review #3
Anonymous submitted on 24/Jul/2015
The paper presents a case study in Digital Humanities. The authors have developed and populated an ontology for representing references and commentaries related to literary works, in this case Dante's primary source texts. The resulting RDF-based knowledge graph is used for literary analysis on the web.

After introducing the topic (section 1), requirements for the ontology based on a set of several existing commentaries on Date's works are explained. The resulting ontology makes use of a large variety of exisiting vocularies explained in section 3. A question here is: are there unwanted or unspecified interactions in semantics when mixing classes and properties from different vocabularies? For example, what is the relation between the classes fabio:Book and frbroo:Work both used in the model? Does the presented model change the semantics of the imported properties, which would not be desirable in general. Please, clarify/discuss these questions a bit in the final version of the paper. As for related works, TEI has been used for related purposes before and the relation of the ontology approach with TEI should be clarified.

Section 4 explains the ontology by using detailed examples. This is very nice. However, I would also like to see a formal model of the ontology presented, e.g., an entity-relationsship diagram. I suggest including this in the final version.

Section 5 describes shortly the semi-automatic ontology population process and some tools developed for it.

Finally, in section 6, application of the ontology is discussed. At the moment, it is possible to visualize the data using bar charts that provide also links to the data. The visualizations are argued to be useful to the scholars, but no evaluation supporting this is presented. No screeshots of visualizations are presented either -- these would be helpful for the reader to evaluate the usefulness of the application by herself. Please extend this section by providing more "convincing evidence" that you have created a useful ontology and tool for humanist researhers.

The paper is in general fluently written, well-structured and well-finished.

Ontology decsription submissions in SWJ are "reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology." As for criterion (1), the ontology seems to be of good quality and relevant and the application domain for ontologies is fairly novel -- there is scientific contribution present. However, very "convincing" evidence about this, by e.g., evaluation or reported usage of the ontology in various applications, has not been provided. However, in my mind the paper satisfies criteria 1-2, if the comments above are takes into account of.

Minor comments

SPAR link (footnote 5) was not operational when I tried it (possibly due to SourceForge?)
Metaphisics -> Metaphysics
clarify that namespace "cnt" comes from footnote 8
in RDF graph -> as an RDF graph
(e.g., XXX) -> correct character errors in XXX
sub property -> subproperty
class crm:hasNote -> property crm:hasNote
"Currently, we ar working in translating ..." -> explain why OWL is needed here
"... with Soggettario Nazionale .." -> explain what is this
Footnote 12 is in a weird position between references 3 and 4