The POSTDATA network of Ontologies for European Poetry

Tracking #: 2359-3572

Authors: 
Maria Luisa Diez Platas
Salvador Ros
Elena González-Blanco
Helena Bermúdez
Oscar Corcho
Javier de la Rosa
Alvaro Pérez

Responsible editor: 
Special Issue Cultural Heritage 2019

Submission type: 
Ontology Description
Abstract: 
One of the lines of work in Digital Humanities is concerned with standardization processes to describe traditional concepts using computer-readable languages. In regard to Literary studies, poetry is a particularly complex domain due to, among other aspects, the special use of language that it implies. This paper presents a network of ontologies for capturing the poetry domain knowledge. The most significative ontologies are presented. These ontologies are related to the poetic work, and its structural and prosodic components. A date ontology that represents the especial needs of literary works is presented as well. This work is part of the results of the POSTDATA ERC (Poetry Standardization and Linked Open Data) project, which aims to provide a means for poetry researchers to publish their semantically en-riched data as Linked Open Data (LOD), in the context of European poetry.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Stefano Borgo submitted on 16/Dec/2019
Suggestion:
Minor Revision
Review Comment:

This paper describes a network of ontologies developed for the poetry domain within an ongoing ERC project. The aim is to present the work done, which was developed following the Neon methodology, and the ontology modules that have been completed up to today.

The paper is clear, well organized and of interest to the SWJ community. English should be revised.

I have a few point that should be considered to revise the paper:
1) in Fig. 3 why is pdcore:authorEducationlevel attached to pdcore:Reduction and not to pdcore:PoeticWork?
2) expand Fig. 4 to include the pdcore:isWrongAttribution (this is a feature that can be of interest for other ontologies)
3) how do you include shape, which is a relevant aspect of several poems (e.g. Easter Wings by George Herbert)?
4) have you checked other approaches for the post-data ontology (sect. 4.4)? I'm surprised you didn't find relevant ontology to reuse (if you did, expand this part).
Since this module is relevant to several domains like archeology and history, this ontology should be given in more detail.

Minor points:
pg. 1 "in a completely and efficiently" something missing.
consider rephrasing: "it exists a significant number..." should be "a significant number... exists".
pg. 4 "autonomy sufficient", perhaps better to write: "a sufficient level of autonomy".
pg. 7 (bottom) reference link missing

Review #2
Anonymous submitted on 30/Dec/2019
Suggestion:
Major Revision
Review Comment:

The paper under review is an interesting attempt at developing a suite of OWL ontologies to represent the scholarly knowledge about European poetry. All ontologies are based on the European Poetry Domain Model, which is a result of reverse engineering of twenty-five repertories of such knowledge.
The briefly discusses the current research within the domain of knowledge representation in the digital humanities, outlines the methodology for ontology development, and presents the main aspects of four ontologies in the suite:
1. postdata-core ontology,
2. postdata-prosodicElements ontology,
3. postdata-structuralElements
4. postdatadates ontology.

The paper exhibits a number of shortcomings, which I grouped into 3 categories.

A. Conceptual shortcomings.
Although the paper mentions some foundational work in the domain of knowledge representation in the digital humanities, e.g., CIDOC-CRM ontology, it does not attempt to exploit this research in any substantial domain.
As a result, we find in postdata-core ontology a number of ontological categories that (i) are known to be conceptually convoluted, (ii) are objects of justified controversies, but (iii) are sufficiently elaborated in defined in any available foundational ontology (e.g., in CIDOC-CRM, DOLCE, etc.): Agent, Organisation, Role, etc.
Thus, all such categories in postdata-core ontology are hardly more than just labels, e.g., pdcore:Organisation is little more than an English word 'organisation'.
Such status may be acceptable for controlled vocabularies, but not for ontologies.
I would strongly recommend embedding at least postdata-core ontology in some foundational ontology - given the subject matter the most suitable seems to be CIDOC-CRM.
In fact some of the classes in postdata-core ontology are already embedded in CIDOC-CRM via FRBR ontology, but others remain unspecified.
In this context I should add that I do not think that using multiple sources of reference for foundational categories guarantees sufficient level of precision and consistency, so I would recommend using one foundational ontology instead of many.

B. Implementation pitfalls
Postdata-core ontology uses a number of classes, properties and individuals from other ontologies. Some of these ontologies are imported, e.g., ontologies from ontologydesignpatterns collection, others are not, e.g., frbroo.
This difference needs to be explained and justified.
Also, and more importantly, I would recommend importing all ontologies that postdata-core ontology uses.
1. This would help to eliminate some inappropriate usages.
For example, postdata-core wants to use http://iflastandards.info/ns/fr/frbr/frbroo/F1_Work class (as a superclass of pdcore:PoeticWork).
Now the problem is that in frbroo there is no such class as frbroo has http://iflastandards.info/ns/fr/frbr/frbroo/F1 class.
It is true that frbroo has http://iflastandards.info/ns/fr/frbr/frbroo/F1_Work, but it is an individual there, which is not, as opposed to http://iflastandards.info/ns/fr/frbr/frbroo/F1, classified as a class.
So postdata-core should use http://iflastandards.info/ns/fr/frbr/frbroo/F1 instead.
2. Also explicit imports would retain all textual descriptions of the imported objects. Without them the postdata-core axioms sometimes remain cryptic.
For example, pdcore:PoeticWork is claimed to be a subclass of http://www.europeana.eu/schemas/edm/ProvidedCHO, but the latter is left in postdata-core ontology without any description, which renders this axiom readable only for those who know the edm ontology quite well.

Using Pellet I discovered that
1. http://postdata.linhd.uned.es/ontology/postdata-dates#ExactDateExpression is a subclass of http://www.w3.org/2002/07/owl#Nothing, i.e., it is not satisfiable.
2. pdcore:PoeticWork have the following subclasses:
- http://postdata.linhd.uned.es/ontology/postdata-dates#ApproximateDateExp...
- http://purl.org/ontology/olo/core#OrderedList
- http://purl.org/ontology/olo/core#Slot
- pdcore:Person
- pdcore:Redaction
I understand that at least some of these subsumptions are not intended.

Some axioms are obsolete, e.g., 'date only DateEntity' axiom for pdcore:PoeticWork is not needed given that range axiom for pdcore:date.

There is also a number of assorted minor issues:
- apparent inconsistencies between textual comments and logical definitions, e.g., the comment on pdcore:isAgentOf says:
'This property points to the work (instance of PoeticWork) which depicts the Role at hand in relation to its contents. [...]'
And the domain axiom requires that the domain of this property is pdcore:Agent, which is disjoint to pdcore:Role.
- multiple conventions used for identifiers of object properties, e.g., compare the pair involves and isInvolvedIn to the pair isTranslated and isTranslation
- typos in rdfs:comments, e.g., 'his property points to the text in which the current instance of Person or Organisatrion is a character.'
- stylistic awkwardness in rdfs:comments, e.g., 'This property points to the instance of Place where the current Person has died' - current person???
- duplicated annotations, e.g., pdcore:certainty
- there are multiple object properties whose ranges are defined by references to instances of skos:inSchema class, but these instances are neither defined or described in any way; as a result, the range definitions are pointless
- in most cases labels of object properties are identical to their identifiers and this is usually inconsistent with their language tags, e.g., retrievesText is not an English word or phrase
- http://schema.org/altitude is claimed to be defined in postdata-core ontology, but its identifier seems to say that it comes from schema.org
- why some data properties are restricted to xsd:integer and others to xsd:integer?

C. Presentation blunders
My overall impression of the presentation is that the explanations provided are too concise to be sufficiently informative.
The most significant of those is the description of how European Poetry Domain Model was developed, which does not go beyond "we did some reverse engineering of the existing systems".
Another manifestation of this issue is section 4 where the four ontologies are briefly specified. In this case each ontology is described by means of its main categories and relations relevant for them. The presentation is provided on the implementation level, i.e., the authors speak about OWL classes, object properties, etc.
I think that these ontologies may be presented in a more meaningful way we the authors applied a different pattern:
a. first describe in plain English an ontology's main categories
b. then refer these categories to OWL classes, object properties, etc.
c. show how they may function by means of an example at the level of instances.
The last stage seems to be me rather important: the paper lacks a single, informative running example that would show how a particular poem may be described using the four ontologies.

Minor presentation issues:

- something seems to be wrong with the grammar of this sentence: 'The weak coupled is guaranteed between ontological modules.',
- btw., in general I would recommend proof-reading of the paper by a native speaker of English as sometimes the grammar and the style looks rather basic.
- in page 4: ', Figure 2.' > '- see Figure 2.'
- something is missing at the bottom of page 3 'These situations are modelled in the ontology, as well. Figure 3.'
- could you explain what 'transversal knowledge of the poetic domain' is?
- in page 7 there is a broken link '¡Error! No se encuentra el origen de la referencia..' - probably missing \cite{} item
- the following fragment strikes me as poorly stylised: 'According to this, we have defined three classes to represent these metrical patterns. These classes are pdprosodic:LinePattern, dprosodic:StanzaPattern, and pdprosodic-WorkPattern.'
- 'a RDF' > 'an RDF'
- the repetition of grammar structure in the last two sentences of section 6 looks awkward to me
- there is an issue with the hyphenation of URIs of ontologies and classes - I think they should not be hyphenated even if this means making the test slightly longer.

Review #3
Anonymous submitted on 08/Jan/2020
Suggestion:
Major Revision
Review Comment:

REVIEW semantic web journal POSTDATA network of ontologies

This paper describes an ontology (composed by several modules) to represent knowledge concerning the domain of the poetry.

I have to say that I'm an expert neither in poetry nor in most of the ontologies in Digital Humanities cited in the paper. Said that a consistent part of the paper is devoted to the listing of the main classes and relations in the modules of the ontology without discussing them from a conceptual perspective and without introducing a complete example that would help in understanding the modeling choices. In addition, the paper provides very few information about (1) the methodology followed to obtain the ontology (and what is called "conceptual domain model"), (2) the way the proposed ontology compares with other ontologies on this topic and (3) the evaluation of the proposed ontology, e.g., in terms of quality parameters, in terms of its role in applications or in use-case experiments. The result is a a plain description of the components of the ontology that provides neither a clear explanation/analysis of the underlying (conceptual) choices nor clear reasons for a user to adopt this ontology (actually this goes against what required for a paper of type "Descriptions of Ontologies"). Furthermore, in several points the adopted terminology seems quite imprecise and confusing to me.

(p.1) "From the philological point of view, there is no uniform academic approach to analyze, classify or study the different poetic manifestations, and the divergence of theories is even bigger when comparing poetry schools from different languages and periods."

In several points the authors stress the heterogeneity of approaches and of theoretical points of view on poetry. However, it seems that all the 25 repositories analyzed do not rise any problem of consistency or disagreement, the proposed ontology seems to incorporate all of them with no integration problem (the authors just discuss problems linked to coverage or terminology not about conceptual/ontological conflicts). In my view, this lack of conceptual analysis is a very weak point of the paper.

A few lines later the authors claim that there "is a great variety of terminologies to explain similar metrical phenomena through the different poetic systems whose correspondences have been hardly studied." This means that even at the terminological level the link between different positions has not been studied. Did the authors study these links to produce the conceptual domain model/ontology? If yes, what methodological and theoretical tools have they used?

(p.2) "it is necessary to standardize metadata and vocabularies at a philological level to be able to climb up the semantic layer and link data between different traditions"

Standardization is a way to (partially) support the integration of data. However, alternatively one can try to introduce on formal links between different conceptual schemas/ontologies without relying on a single standard ontology. I don't know if this strategy potentially applies to the case of poetry, but in general - and especially outside a purely scientific domain - it is very difficult to individuate "the representation of a universal and complete poetry domain" which is accepted by all the experts in the domain

(p.2) "there is not a conceptual model of ontology referred to metrics and poetry "

what is a conceptual model of ontology? please explain

(p.3) "The most significant entities [in FRBR] are Work, Expression, Manifestation, and Item, which represent the different ways of conceiving a literary work as a text or physical resource."

In the postdata-core ontology pdcore:PeoticWork is a specialization of Work and Redaction is a specialization of Expression, but no reference to Manifestation and Item is provided in any of the discussed modules. It is then not clear to me whether the authors embrace all these 4 FRBR concepts (but some of them are introduced in modules still under development) or whether they rule out some of them (and in any case their choice is not motivated/explained).
This is just an example of the lack of comparison with work that already exists and of conceptual analysis in general.

Another example: the authors claim that the Text Encoding Initiative (TEI) has a verse module that allows to "annotate forms and structures of poetic works". They also claim that "the relationship between ontological models and TEI has been taken into consideration very seriously in the last years". However, TEI is not mentioned in the following of the paper.

(p.4) "The first step for tackling this work was to build a conceptual domain model of European poetry (...) The result is a European Poetry Domain Model (DM-EP) with 40 entities, 494 attributes, and 409 relationships."

This confuses me. What is a conceptual domain model of European poetry? Is the postdata ontology (the whole network) a conceptual model of European poetry (formalized in OWL rather that in another language) or not? What is the difference between DM-EP and the postdata ontology? Is DM-EP informal or it is specified by using a formal language (different from owl)? And again, no detail is provided about how DM-EP has been built, what difficulties have been encountered, how eventual conflicts has been resolved, etc.

(p.4) "The classes, relations and axioms of the ontology must be thematically related or complete the semantics of another ontology entity."

I don't understand this sentence. Furthermore, please clarify what is an ontology entity.

(p.4) "the underlying semantics of each class is related to the area of knowledge."

again, I don't understand this sentence

(p.4) "it is a self-contained ontological module that preserves the relationships with other ontologies."

what does it mean? in which sense the module "preserves" the relationships with other ontologies?

(p.4) "a high degree of cohesion is achieved, the ontology functionality is described and avoids coupling with other ontologies of the network."

what is "the ontology functionality"? In which sense it is described?

(p.4) "Moreover, we have placed particular emphasis on establishing both the domains and the ranges of the properties. It allows defining its semantics completely and reducing ambiguity."

Are the authors suggesting that by establishing the domain and the range of a relation one semantically characterizes such relation in a complete way?

(p.4) "After each iteration"

It would maybe useful to better understand of what an iteration consists

(p.5) "Likewise, the definition of cardinal, and universal and existential restrictions in classes have been undertaken to prevent inconsistencies and avoid semantic conflicts."

I don't understand. Is the role of cardinal/existential restrictions to avoid inconsistencies and conflicts?

(p.5) "In the poetry domain, a poetic work, a poem, can be represented by different manifestations or versions. Of course, it is usual to find a set of poems grouped, for example, in a book. These situations are modeled in the ontology, as well. Figure 3."

If the term "manifestation" is used in the sense introduced in FRBR, then I don't see manifestations in fig.3. (redaction is a specialization of expression not manifestation)

(p.5) "The most significant classes of this ontology are
pdcore:PoeticWork, pdcore:Redaction, and pdcore:Ensemble"

maybe, when the scope of the classes is clear enough (as in this case) the module prefix ("pdcore" in this case) could be omitted

(p.6) "This core ontology not only contains the necessary information about a poetic work but a set of common properties that have the same semantics in all the classes in which they are defined."

please, specify what does it mean for properties to "have the same semantics in all the classes in which they are defined"

(p.6) "we have identified a set of controlled vocabularies used as ranges of the following properties in the classes."

if the authors need more space to better explain the methodology and rationale behind the proposed ontology, the details on controlled vocabularies could be deleted (also at the end of section 4.3)

(p.8)
I'm not sure to understand the example depicted in fig.5. Is this an example of a stanza composed by two lines: "Mo chion dot..." is the (content of the) first line and "a mhic na flatha..." is the (content of the) second line? If this is the case, first I don't see why "a mhic na flatha..." is attached to the stanza [11161] and not to the line [11152]; second I don't see why only the first line is attached to OrderedLineList[1115] or, vice versa, why not attaching both lines to OrderedStanzaList[1116].

Actually, it would be more clear to at least explicitly indicate what is the example taken into account before trying to represent it using the proposed ontology. One could also consider to use this example, opportunely extended, to show how the different classes and relations in the whole ontology can be instantiated, and then to better illustrate how the model works.

-----
TYPOS
-----
(p.2) "the impossibility of having ways of processing this information in a *completely* and *efficiently* have..."

(p.3) "These ontologies can cover the descriptive aspects of the works and their forms of expression and manifestation, but *it* does not..."

(p.4) "scientific. communities"

(p.7) "¡Error! No se encuentra el origen de la referencia.."

(p.8) "These ontologies have been enriched with more classes used to store the prosodic analysis data"
(??) this ontology has been enriched...