Review Comment:
In terms of the official criteria for the 'Data Description' track, I can say the following:
1) Quality of the dataset:
The authors have obviously put a lot of thought into ensuring the high quality of the content of the dataset by only considering authoritative historical sources; a lot of the discussion in the paper refers to where the dataset is sourced from and why those sources were chosen. Likewise, according to the authors, links generated to other datasets have been verified by domain experts.
In terms of the technical aspects of the dataset, these have been much improved in that a lightweight vocabulary has been made dereferenceable, URIs appear valid and standard vocabularies are not being used more often. However, I still notice quite a few bugs in the system.
For example, when I try to access:
http://demo.seco.tkk.fi/ssaha/project/resource.shtml?model=ww1lod&uri=ht...
I get an exception embedded at the end of the data:
( Expression p.valueTypeLabel is undefined on line 48, column 44 in saha3/resource.ftl. The problematic instruction: ---------- ==> ${p.valueTypeLabel} [on line 48, column 42 in ...)
When I tried to access the link:
http://ldf.fi/ww1lod/main/
as given in the paper, I get an authorisation request and a subsequent 401. This suggests that the system is still in a prototype stage.
2) Usefulness:
In general, it is difficult for me to assess usefulness since I am not an expert on WW1 and I am not the target audience for the dataset. I don't believe I would have reason to use it myself.
However, more generally, what I can say is that the content of the dataset appears to be of high quality and manually curated from authoritative sources. The drawback of this approach is that the dataset is quite small: I count around ~3k entities, ~40k triples, and ~300 links to other datasets. Moreover, whatever about the size of the dataset, the scope of the dataset in the context of WW1 seems quite limited. A couple of hundred key events have been annotated, but the focus of the dataset is on the atrocities in Belgium. The authors may argue that this is a seed for further contributions, but since the paper was first submitted a year ago, I am unsure if any major advancements have been made in broadening the scope of the dataset in any significant way.
Given that the dataset is rather specialised, I think it really targets domain experts in the area. As such, interfaces and tooling are important. And there are some quite nice demonstrators, such as the annotated reader, the Linked Data browser, and the SPARQL editor assistant. But there are parts of the system that seem unusable. For example, if I click this link:
http://demo.seco.tkk.fi/ssaha/project/resource.shtml?uri=http%3A%2F%2Fld...
I get a bunch of triples in the form of a predicate–object list, where both predicates and objects are given using full URIs. This is not human-readable at all and I doubt domain experts would find this intuitive (I certainly didn't). For example, the time-span of the event in question is presented as:
http://www.cidoc-crm.org/cidoc-crm/P4_has_time-span | http://ldf.fi/ww1lod/95ed7607
This is unreadable; I still don't know what the time-span was. Furthermore, if I click on the object URI in the interface, I get no information about the time-span, just a bunch of references. However, if I access the system through http://ldf.fi/ww1lod/, I get a much cleaner interface with maps and more readable values.
As such (and together with the prior technical issues), I question whether or not the system is still in a prototyping phase.
3) Clarity and completeness of the descriptions:
The paper is quite well-written and does a good job of motivating the work. However, the paper relies too much on the homepage, where an overview of the dataset and its statistics are provided, as well as example queries that can be run against the SPARQL endpoint. In general, I think the authors should present a lot more of this information in the paper itself: at the moment, I found the homepage probably more informative than the paper itself. Instead, insofar as possible, the paper should stand alone as an independent contribution. As such, I think that the authors should present and discuss some example queries in the paper, as well as some of the more interesting statistics given in the VoID description.
In summary, my concerns about the size and the scope of the dataset and the prototypical nature of some of the systems remains; likewise, I am concerned about the completeness of the article as a standalone description.
In terms of what can be fixed for a revision:
* Fix the technical problems with the system as outlined above.
* Provide example queries and important high-level statistics in the paper itself, rather than pointing to the homepage for more information.
This then just leaves my concerns about the size and the scope of the dataset; if the authors address the above issues I would be willing to accept the paper for publication despite the limited scope of the dataset. However, the editor or other reviewers may see differently.
Also, some of my previous minor comments were, unfortunately, not addressed:
MINOR COMMENTS:
* "user needs research" -> "user-needs research"
* "academic sources. [13,17]" -> "academic sources [13,17]."
* "Further, the thesaurus" -> "Furthermore, the thesaurus"
* "1914-1918" Again, use en-dash for intervals. Though most cases have been fixed, some still remain.
* "the WW1LOD vocabulary" Fix the bad box.
* "In these integrations however, a problem that appears is that ..." Poorly written. Rephrase.
* Numeric columns in tables should be right aligned.
|