Review Comment:
*** Summary
The authors describe in the paper a dataset dealing with events, locations and agents from the first World War which is announced as a reference dataset.
This dataset was published using various interfaces and applications such as a SPARQL endpoint.
In the following i will give a detailed review among published dimensions from the call:
http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-s...
##################### Usefulness (or potential usefulness) of the dataset,
The paper is described as a reference dataset. A specific use case is not given, but readers can imagine that there would be interesting queries possible.
At least if a few more links to other datasets are included (which is as well noted in the section 5).
I think historians would be happy to have this and similar datasets. I heard the first time about datasets adressing historical war events.
Maybe it would be nice if such a dataset can act as a starting point in this domain.
The given examples show that the dataset can be used to extract information about agents, locations and events from the first World War.
##################### Clarity and completeness of the descriptions.
Overall the paper is well written. But the paper must be enhanced on a few points. There is still room for improvement on the 6th page.
As described in the next section, the dataset itself its concepts and the description about the process undertaken to create the dataset must be enhanced.
The criterion "completeness of the dataset" can be voted as rather uncomplete, or better: as a reader iam not sure about the completeness.
It would be very helpfull to know for instance how much events happend during the WW1 (approx.) and a comparision to those you included in the dataset.
I only get the information that 690 events are addressed in the datasets.
##################### Quality of the Dataset
**** Name, URL, versioning, licensing, availability, source for the data ****
I had a look if there is a landing page / project page announced in the paper but was not able to find one. In my opinion this is mandatory to give further descriptions and
maybe linking the interfaces, describing the maintainers and maintenance at all. As well the examples can be listed there.
The name of the publication is "World War 1 as Linked Open Data".
Having a look into the dataset (the linked dump file) give me no information about that.
I was not able to find a resource of type owl:Ontology or similar where such an information would be expected (label, versioning, licensing, authors, maintainers etc.)
As described, the dataset is published under CC-BY-SA 2.0 which is almost fine. I only wondered why the authors did not published the dataset using the current/latest version of the license.
Having a look on http://creativecommons.org/licenses/by-sa/2.0/ gives me the hint that there is a newer version of the license available.
This is not a problem to me, its only a hint to think about updating the licensing rule.
All given URLs in the Data Access section (section 3) were returning results fastly. The content seems to be correct.
After i downloaded the dump of the dataset i tested a few properties used in the dataset if they are dereferencable.
So i tried for instance:
http://purl.org/ww1lod/schema#Time
http://www.seco.tkk.fi/history#theme_military
and the results were in both cases a 404. This should be changed in order to publish the dataset properly.
**** Purpose of the Linked Dataset, e.g. demonstrated by relevant queries or inferences over it ****
The description contain a section about example queries which are well described. I tested them using the SPARQL endpoint and both returns the desired results.
**** Applications using the dataset and other metrics of use ****
This dataset is used as a reference dataset. I could not find any application that uses / reuses the dataset at the moment.
Maybe this should be described / discussed in the paper. A few use cases would show the usefullness / impact of the dataset.
**** Creation, maintenance and update mechanisms as well as policies to ensure sustainability and stability ****
The creation of the dataset was sketched and the the contributers were referred. A description of the maintenance and update mechanism are missing and
would be nice to include into the paper.
**** Quality, quantity and purpose of links to other datasets ****
The section 2 contain a description about the instances of the dataset.
An overall count of triples is not included. But using rapper shows that the dataset is rather small:
rapper -i turtle -c ww1lod.ttl
rapper: Parsing returned 20254 triples
The given table 1 is nice to have and show the instances of the respective types. What i not understand is the acronym given in headline of column two.
The authors describe that they created automatically a little over 100 owl:sameAs links to DBpedia. In fact that this is not very much and can be improved for sure, it would be interesting
how these automated links where created. There are much tools available such as Silk or Limes.
**** Domain modeling and use of established vocabularies ****
Within the description (paper) a core data model is given illustrating the vocabulary on an abstract level. I would prefer a illustration or at least a description
that denotes not only the abstract concepts but as well the namespaces where they came from. This would give readers the impression of what is published from your
side and what was re-used.
For instance the concept agent can maybe be re-used from the foaf vocabulary but used is http://schema.onki.fi/agent-schema#Person which is not de-referenceable.
The concept place can be re-used from linkedgeodata / dbpedia / spatialHierarchy etc.
Longitude and latitude are taken from the WGS84 vocabulary which is a perfect choice but it is not described in the paper.
**** Examples and critical discussion of typical knowledge modeling patterns used ****
Examples are given in section 4 which i was able to use for testing. I tested them using the public sparql endpoint given in section 3.
One smaller issue was to copy them from the paper. I had to refine them in order to get them working (but could be a problem of my pdf reader).
Maybe it could be an advancement to add here links (purl / tiny urls) to the project page and the paper to get these examples running more easy.
A few discussions (but not critical ones) are included such as modeling places/locations and their temporal relations.
The authors discussed that this is an difficult topic, especially in the domain of war events.
It would be nic to add here more description how it was solved. For instance the dbpedia ontology contain a few concepts to encode locations and vocabularies
such as the spatialHierarchy addresses the same topic of how to encode a temporal dimension in a spatial domain:
http://ns.aksw.org/spatialHierarchy/
https://raw.github.com/MichaelMartin/spatialHierarchy/master/shv.png
**** Known shortcomings of the dataset ****
Shortcoming are not explicitly described but as written in section "5 Discussion and Future steps" the creation and enrichment of the dataset is still ongoing.
##################### Minor remarks
Page two section2 first column:
"This need was discovered early on in indexing the primary sources" -> "This need was previously discovered in indexing the primary sources"
Page two section2 second column:
[?] -> citation not well rendered
Page three section2 second column:
reasearch -> research
Page four section4 second column:
Here, the concept selection function ... are key. > there is something missing. So iam not sure what the semantics of this sentence
Page 3 footnote,
Instead of using a dbpedia link i recommend to use a citation. There are many that can be taken from bibsonomy.org.
|
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-s...