Review Comment:
First, the paper has to my opinion improved a great deal, particularly with regard to clarity of expression.
However, with the benefit of this clarity I've come to realize that in my mind, this is actually not a Linked Dataset Description paper. In fact, it reads 2/3 like a project description and only 1/3 like a dataset paper. Both of these would be interesting in their own right (the project description even moreso), but trying to cram these two different narratives into a single paper does a disservice to both. Therefore, while I actually now like the content, I am still recommending a rejection and resubmission as properly focused separate papers.
With this in mind, I'll give some comments for both of these orientations:
First, for a dataset description (which as defined by SWJ is primarily aimed at potential dataset users), the article really should delve deeper into the dataset itself. Here, a further detailing of the contents of the different subdatasets _would_ be essential (i.e. which metadata fields are recorded for each subcollection, and which vocabularies they refer to). On the other hand, for a project description paper I would agree with you that such detail could be left out.
For a dataset description paper again, the Linked Data API section should be much more detailed in exposing how to actually access the dataset programmatically (e.g. detailing and giving examples on how to use the Solr API, as well as giving straight links to where dataset dumps are provided), as well as go into even more detail on e.g. how to parse and use the dataset/statement revisions (examples would be good here). On the other hand, the details on data ingestion tools, export to Europeana or Pundit integration actually don't concern a dataset user.
For a project description paper, this would again be reversed. In such a paper I would however be interested in additional reflection: why did you end up with two different tools for ingestion? How did the dataset providers relate to these? Is there actually a need for statement-level provenance (or actually dataset-level for that matter)? What are those use cases? Are they actualized? How did that go? Are there any lessons to be learned in how the project approached the Europeana ingestion? Comparisons to the methods of other projects? Expand on the reasons you needed to expand Pubby, and so on..
On a separate matter, I must also note that both the search site as well as the LD API were again down for at least two days at the time of this re-review, so I haven't been able to validate all responses relating to e.g. provenance information encoding.
Finally, I found one typo: in "Furthermore, for datasets contains links to annotatable digital objects", you should have "containing links" instead of "contains links".
|