Review Comment:
The authors presented Datanode, an ontology pattern that models an abstraction of data artifacts, which can be datasets, repositories, catalogues, or registries. The aim was to fill the use-case gap not covered by other vocabularies such as VoID, DCAT, or Prov-O, in particular, concerning the relationship between data artifacts, so that to allow for inferences about them.
Overall, the comments below means that some major revision may be needed, at least on the paper writing.
ON READABILITY AND CLARITY OF PRESENTATION:
Readability is not a big problem, though the remark below should be able to improve it.
For a user who is not yet familiar with the pattern, the pattern description seems daunting and complicated due to the large number of properties in the pattern. Moreover, the six branches described by the authors are clearly not completely separated because a quite significant number of properties belong to more than one branch. Although the tables do list them all explicitly, a more complete visual description would greatly help in understanding the property hierarchy. Figure 1 only visualize the hierarchy up to the top level properties, and does not indicate the shape of the hierarchy below them, which is definitely not just six simple, separated branches. The meaning of some properties are also not entirely clear (see below). Arrangement of the content in the tables should be improved: the order in which the properties appears in the tables seem random; it would be nicer if they are ordered alphabetically or in some other easily recognizable order to make it easier to locate a particular property.
ON THE PATTERN DESCRIPTION
I like the steps conducted by the authors in the development of the pattern. Typically, content pattern is designed together with some involvement of domain experts. However, in the case of Datanode ODP, the authors themselves could also be considered domain experts. Hence, going through the steps as described by the authors would reasonably lead to a quite good abstraction of the use cases.
The resulting pattern and its description, however, need some improvements.
Some properties are not entirely clear what they mean. This is rather unfortunate considering that the authors (seemingly) emphasize that the pattern can support more useful inferences about datanodes than what the other vocabularies can do.
1. hasInterpretation / isInterpretationOf
What is an interpretation in this context? If I have a datanode that is an interpretation of another datanode, how would it look like? Interpretation can mean completely differently for different users. The subproperties (hasExtraction, hasInference) are pretty clear, though. So, probably, hasInterpretation and isInterpretationOf are not needed in the pattern? Is there any use case for hasInterpretation besides what can be described by hasExtraction and hasInference?
2. hasStandIn / isStandInOf
What is a "stand in" of a datanode in this case? "Stand in" usually means "substitute". My understanding of these terms is that they depend on the context. That is, when I assert that datanode B stands in datanode A, I would probably do it because my application's condition requires it. Without explicitly considering this context, I'm not sure the "stand in" relationship makes much sense. The authors should probably justify the use of this terms with some example, or simply drop them altogether.
3. remodelledInto vs. refactoredInto
It might be simply me not familiar with the relevant use cases here, but I don't quite understand the differences between these two properties. Both the terms "remodeling" and "refactoring" is commonly used to describe a restructuring process (a building or a shape for the former; code or software for the latter). I'm not entirely clear what they mean for datanodes and whether they are actually different.
4. The adjacency branch
Implicitly, it seems that relations from this branch should only be used for two datanodes that belong to the same data container, such that one is not part of the other. This is, however, not formally indicated in the pattern. Also, what does it mean to have a disjointPartWith relationship between two datanodes? Why does it imply that they are part of the same dataset? This relationship can also be conceivably applied to two completely unrelated datanodes, which do not belong to the same dataset at all.
5. overlappingCapabilityWith vs. differentCapabilityFrom
Among the six top relations, overlappingCapabilityWith and differentCapabilityFrom are the most confusing ones. Can two datanodes be related by BOTH properties at the same time? Or is differentCapabilityFrom intended for two datanodes that share no capability at all? It is conceivable that two datanodes have disjoint population, but use the same vocabulary. Hence, they would be both related with the overlappingCapabilityWith and differentCapabilityFrom properties. This may be suprising for some users, as they may use the differentCapabilityFrom property with an intention to indicate that both datanodes share no capability at all.
The notes in the online version of the pattern says that those two properties are needed to generically express comparison of datanodes with respect to specific tasks, which is quite clearly the case when we look at the subproperties of the above two properties in Table 6 and 7. However, I think, using those two properties above might not be the best choice to express a generic comparison. I would rather use something like shareCapabilityWith, together with disjointCapabilityWith.
6. differentVocabularyFrom or disjointVocabularyFrom?
Is differentVocabularyFrom intended to be used for datanodes that share no vocabulary terms at all? Wouldn't disjointVocabularyFrom a better term? Vocabulary is usually seen as a collection of terms, hence it is possible for two datanodes to have an overlapping, but different vocabulary at the same time.
7. disjointPortionWith and disjointSectionWith
Was the intention to say that A "disjointPortionWith" B if A and B are both some disjoint partition of the same datanode C? Or was it to say that A "disjointPortionWith" B if A has some portion A' and B has some portion B' such that A' and B' are disjoint? My guess is that the first one was how the authors intended, but unfortunately, neither is this clear from the explanation, nor is this formally asserted. If the second reading was intended, then it will result in every two datanodes to be trivially related through the disjointPortionWith property. In fact, a datanode would be "disjointPortionWith" itself because we can always conceive an empty portion of it. The remark for the disjointSectionWith property is similar.
8. redundantWith, sameCapabilityAs, duplicate
The authors said: "overlappingVocabularyWith and overlappingPopulationWith, both leading to redundantWith, sameCapabilityAs, and duplicate - all describing a similar phenomenon with different intentions". Could you please explain their differences? When is a term is more appropriate than the others?
9. Leveling in the pattern
I noticed that the online version of the pattern at http://www.enridaga.net/datanode/0.3/ns/ contains some sort of division of the terms into levels (level 1 to 5). What do the authors mean by this? Why is this not reflected in the paper?
10. Is there any other useful inferences that can be drawn using the pattern aside from subproperty relationship? Some properties are asserted with certain property characteristics, e.g., symmetry, functionality, etc. The scenario, however, only talks about inferencing shortcuts (without even spelling out what actually happens in Figure 10). Are such property characteristics useful in some other scenarios? Is there any example for them?
ON ALIGNMENT WITH EXISTING VOCABULARIES
From what I understand regarding the motivation from the authors, the main aim for the pattern is to cater for use cases that cannot be covered by other existing vocabularies. If this is the motivation, I would think that there should be a much more detailed comparison, especially with voID, DCAT, and PROV-O. The section describing the alignments with those existing ontologies only says which part of VoID, DCAT, and PROV-O corresponds to the Datanode pattern. In my opinion, it would be very useful for the users of the pattern if they also understand more clearly the differences between those existing vocabularies and the Datanode pattern, e.g., which features PROV-O posseses, but Datanode do not, and vice versa. The application scenario does describe a situation where Datanode makes a difference, but the users (especially those who are very familiar with VoID, DCAT, or PROV-O) would be helped if other differences are detailed.
MINOR TYPOS, STYLES, etc.
Please be consistent: data node or datanode?
When you list more than two things together separated with commas in which the last one was preceded by the word "and", please put comma before "and", e.g., write "item1, item2, and item3" not "item1, item2 and item3".
p5
left col, par 3: can be related each other --> can be related to each other
right col, par 2: mod-els --> mo-dels
p6
left col, spacing between par 1 and 2 of section 4 needs to be fixed.
right col, section 4.1: This relation has for inverse about --> The inverse of this relation is the property about
p7, section 4.4: Similarly to consistency --> Give pointer to section 4.5?
p9, par1: infererences --> inferences
p9, par3: hasUpdate --> The hasUpdate property
p14,
left col, line 3: partitioning etc... --> partitioning, etc. (no need to put three periods after etc)
left col, line 10: indirect affected --> indirectly affected
Fig. 10: some of the arrows (possibly the dotted ones) are not visible when printed in black-and-white.
References should be rechecked and better formatted; please put the information consistently, e.g., some conference names are only given as a short abbreviation, while others are given as a complete name. The following are the ones I found:
[1] lod --> LOD
[2] aurin --> AURIN, ands --> ANDS, pages 75-82
[4] uk --> UK
[6] Semanic --> Semantic
[8] the authors should be: Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao.
in the title: void --> VoID ?
[9] this is a W3C working group note; please put a more complete information in the reference
[12] extreme --> Extreme
[14] Dbrec --> DBRec
[19] lod --> LOD
[25] "Technical report" appeared twice.
[26] Is this a technical report?
[28] What's the venue? Citeseer?
[33] Is this a technical report?
[34] dl --> DL?
[35] cad --> CAD
[38] Technical report?
[40] pa --> PA? cnr --> CNR?
[41] ou --> OU?
|
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/blog/special-call-ontology-design-pa...