Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
The authors present a compact, focused experiment on applying ShEx validation to libraries' datasets to foster data re-use, with four exemplifying use cases on datasets provided by three individual libraries (and one non-library data). The presented methodology is quite straightforward application of ShEx. From purely technological perspective, the originality and significance of the contribution is not particularly high, but especially for researchers and practitioners working with (linked) data in GLAM institutions the paper would be relevant.
Compared to the previous version of the paper: the authors have made improvements on the paper, extending it sufficiently on sections that needed further discussion. The comments I made in my previous review have been addressed sufficiently. The quality of the writing is good.
The data file provided by the authors under “Long-term stable URL for resources” (A) is well organized and contains a README file, (B) appears to be complete for replication of experiments (based on the README file, file listing, and looking at couple of individual data files), (C) is stored on Zenodo, and (4) appears to provide complete data artifacts (based on the README file, file listing, and looking at couple of individual data files).
I have one comment:
- Page 14 "Regarding the NLF dataset, a common problem is related with the property rdf:langString used for language-tagged string values that are validated against xsd:string" - In such cases, why did you constrain the property value's datatype to "xsd:string" - instead of "LITERAL" (https://shex.io/shex-semantics/index.html#shexc) in the ShEx definition? For example, concerning NLF dataset's class Person, you have constrained the property schema:name's value to xsd:string (https://github.com/hibernator11/ShEx-DLs/blob/1.1/nlf/nlf-person.shex#L12). In the NLF data model the range of schema:name is defined as "Literal" (https://www.kiwi.fi/display/Datacatalog/Fennica+RDF+data+model), and in the schema.org vocabulary the range of schema:name is defined loosely: "schema:name schema:rangeIncludes schema:Text" (https://schema.org/version/latest/schemaorg-current-https.ttl). I would suggest loosening the constraint.
Minor remarks:
- Page 14: "Table 6 provides an overview of the data quality evaluation. All the assessed repositories obtained a high score, notably the BNB and the BnF." - Based on Table 6, NLF obtained as high score (mconRelat) as BnF. Mention NLF as well?
- Page 14: "Regarding the NLF dataset, a common problem is related with the property rdf:langString used for language-tagged string values that are validated against xsd:string" - rdf:langString is not a property, but a datatype.
|