Review Comment:
Overall this is an interesting article which compares different RDF modelling approaches to facilitate temporal information in the Henri Poincaré Correspondence Corpus. Such an analysis contributes to the Semantic Web communities in the cultural heritage and digital humanities sector. As a background, the paper starts from the discussions on some important issues on temporal knowledge representation for historical studies. The project aims to develop an easy-to-use semantic application for the historians who may not have substantial technical skills and knowledge on graph databases, which is vital for the wide acceptance of the technology in the research arena.
The previous studies in the article are unfortunately limited to the presentation of temporal modelling in RDF in general, although it is interesting. For the special issue of SWJ, it would be required to outline the existing applications of the models in the historical studies. Thus, the literature section can be improved.
The methodologies are generally sound to analyse the temporal data representation models by SPARQL queries. A minor issue is the second SPARQL query is too similar to the first one, thus various query patterns are not well investigated. Reconsideration is appreciated. The implemented application will certainly help the end users of the digital corpus, which is a valuable development for digital humanities alike.
Diagrams and figures are well prepared. The graphical illustrations of the models are very helpful. Table 1 is a good contribution to SWJ.
In general the article is presented in professional academic English in a good order.
The most critical shortcomings of the paper would be the logic or coherence between research questions/goals and conclusions. Some parts of the argumentations and conclusions are not adequately clarified and interlinked. For those reasons, a major revision would be recommended. More details about this point are provided below.
Major issues
-------------
There are many aspects of temporal data modelling, so it is recommended to specify/clarify (in the beginning) what aspects of modelling are discussed in this paper. For instance, temporal modelling may include discussion of temporal hierarchy, relations between relative time, and individual interpretations of time. This paper focuses on the relations between absolute time instants or intervals mostly for the biography of persons and the letter objects, but it is not clearly stated as such, due to the generalisation of the subject (including the paper title).
Section 3.1 provides valuable information about critical issues on time in historical studies, but they are dealt as generic examples. It is not certain if those points are actually evaluated in the later sections in the sense of Semantic Web (SPARQL queries), which is the central topic of this paper. In this regard this part of arguments would not be most convincing to come to the conclusion. Clearer connections between Section 3.1 and Section 5 and 6 (and 7) would be required.
Section 3.3 should better describe how temporal information is currently stored (outside RDF in a separate database, perhaps?). It is not enough to say the current data model/ontology is not able to represent temporal information. In case such data does not include temporal data at all, the problem is not the existing ontology, but the lack of data (value) in the first place. Then, the new data modelling would not solve the problem. Similarly, Section 3.3, states “In a history of science context, it is necessary to consider knowledge that is sometimes incomplete because of the lack of resources related to a certain context.”. This is absolutely true, but this is a general uncertainty issue, not necessarily the time issue about which the paper would like to discuss.
Moreover, “These issues require the addition of a temporal element to these relationships between people and places.”. This part also mixes up the uncertainty and temporal elements. Although those two issues are related in some cases, they are separate in other cases. Therefore, more clarification would be needed about what types of temporal data issues the paper concentrates on.
In Section 3.3 “For this knowledge representation work, the granularity is defined at the level of the day. This allows consideration of data associated with letters in the correspondence for which the day of writing is sometimes known.” It is good to define the day level granularity, but the issue in the earlier part (reasoning) has little to do with it. The issue can be solved by providing the process of reasoning (by historians): who made the reasoning and when etc. Therefore, I am not sure if the day level granularity provides an actual solution to the issue raised.
Section 4.1.3.The temporal model of CIDOC-CRM should be more explained and examined as a standard for cultural heritage ontology. This is a big shortcoming when discussing cultural heritage data modelling.
Section 4.2. This part would need to be extended to include more references to other important initiatives, including Wikidata, DBpedia, LODE, HuTime, CIDOC-CRM, EDM etc.
Section 5.6. The arguments would be weak to justify the decision. To improve, they can be more tightly related to the issues raised in Section 3. One discussion missing is that the decision is based on the Henri corpus only. Semantic Web is advantageous, when data is integrated with external datasets. A common problem of such data integration is that it becomes very complex to make federated queries across multiple endpoints, due to the variety of ontologies. For this reason it is doubtful that n-array is the best choice. It may be intuitive for the Henri project members, but it may not be the most popular way of encoding historical corpora (mainly because "original triple is lost"). In this regard, the CIDOC-CRM (4D fluent) approach seems to be more suitable for interoperability and data integration. Even if the authors still argue that n-array is the best solution, the comparison of different approaches could be done more carefully, and convincing justifications could be presented.
Section 6.3. “This entailment mechanism is useful for specifying relationships between temporal elements.” The mechanism is fascinating and this method can be reused/extended for any inferences, thanks to the rule based approach. However, it is uncertain if historians and data modellers can define all such inferences beforehand. The danger is, in case a few rules are missing, the users would believe that they have obtained query results with all possible inferences included. In addition, this method seems to lack ways to preserve the rules in RDF, next to the source RDF data. XML representation is acceptable, but it would be desirable if it is somehow more integrated into the source RDF data, so that the rules can be also queried/examined by the end users. Moreover, the versioning and provenance (metadata of rules) would be a useful addition. It would be recommended to present not only “good results”, but also “pending issues”, and how to address the latter in the future.
This paper did not investigate how the rules could be modeled in the ontology itself, instead of adding extra rules in XML afterwards. As mentioned above, the rules can be written only when historians know beforehand how to infer information from the source data. Thus, in principle it is also possible to encode inferences in the ontology. This point could be examined further.
Conclusions in Section 7 are generally well written, but it seems that the outcome of Section 4.2 is not really reflected. Only the “t1 before t2” example is presented. In order to represent diverse rules and time relations for historical studies, it would be important that other properties such as "A overlaps B", "A metBy B" are also evaluated in the context of historical analysis. The modelling is even more complex when dealing with uncertainties. For those reasons, at least CIDOC-CRM should be analysed in more detail, but it is largely missing in the article.
Minor issues
-------------
Section2 is too basic for SWJ. As this article is not for a cultural heritage journal, it can be significantly shorter, or completely omitted.
Footnote 14. It is interesting, but rather too detailed, not directly related to the Semantic technologies. It would not be needed for SWJ.
Section 4.1.3 and 4.1.4. Explanation is too vague and not enough for readers. For example, 4.1.3 has no mention of how to model temporal data. In general, it is easier to understand the subsections of Section 4, when reading the modelling examples in the subsections of Section 5. But, it is hard to understand without them. Thus, it would be nice to restructure and harmonise the two sections.
It would be interesting to investigate (in the future) how to preserve and distinguish the transcription of the source letter and the interpretation of it. (i.e. in this case, the inference of t1 before t2, which is not written on the source letter). This remark is not directly related to the ontology modelling that this paper is about, but, since the ontology modelling needs to take text encoding into consideration, it is highly relevant.
|