Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...
Revised manuscript after an accept pending major revisions, now accepted for publication. The reviews of the original submission are beneath the second round reviews.
Second round reviews:
Solicited review by Aba-Sah Dadzie:
The authors have done a good job of addressing the review comments. I'd recommend acceptance for the special issue, with a few minor corrections/additions. With regard to the specific requirements for this call:
* Quality of the dataset - this is well described and pointers to sample queries allow the reader to directly access the linked data.
* Usefulness (or potential usefulness) of the dataset - clearly contributes to the arts and cultural heritage. Further work planned by the authors indicates potential for further use and added value enabled by the conversion to LD.
* Clarity and completeness of the descriptions - the revised paper addresses concerns expressed in the 1st review. The use of existing standards and extensions to these are well described and referenced. The process followed in generating the dataset and the overall aims are also more clearly described.
* Name, URL, version date and number, licensing, availability, etc.
Version information and licensing information missing
* Topic coverage, source for the data, purpose and method of creation and maintenance, reported usage etc.
* Metrics and statistics on external and internal connectivity, use of established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language expressivity, growth.
* Examples and critical discussion of typical knowledge modeling patterns used.
- Addressed in sufficient detail
* Known shortcomings of the dataset. - fairly well addressed, but see point below.
_____________________
Additional points to address
p.2 - "Although this approach ensures a level of consistency and interoperability between the datasets from different institutions it creates a disconnect between the cultural heritage institute original metadata model and the Linked Data version."
This begs the question "why"? Also, this appears to be contradicted in the final paragraph in this section.
p.5 - " Finally 34 persons were linked to persons in DBpedia. This is a relatively low number as 1) most of the Amsterdam Museum people are not notorious enough to appear in DBPedia" - do you need to be "notorious" to appear in DBPedia? "Famous" or "noted", maybe, but notoriety is normally considered to be negative or at best not complimentary.
Language & Presentation
Generally well written but needs a spelling & grammar check and proofread for minor errors. Among others,
p.4 - "proxies-aggregation" -> "proxy-aggregation"
Formatting of URLs
Some of the URLs them break because the formatting is splitting them and/or inserting whitespace at delimiters - requires the reader to copy the full URL and delete the white space inserted to reach the intended address.
Solicited review by Philippe Cudre-Mauroux:
This second iteration corrects the minor flaws of the first version of the paper (which I already liked actually). From my perspective, this paper is ready for publication.
Solicited review by Fabien Gandon:
my previous comments were addressed.
First round reviews:
Solicited review by Fabien Gandon:
This paper presents the Amsterdam Museum Linked Open Data set
The access point, content, metrics, statistics, modeling rationale, etc. are provided by the paper, which is in my opinion a very good contribution to this CFP.
"with suffix 'proxy-', 'aggregation-', 't-' or 'p-' for proxies, aggregations, concepts and persons respectively (eg. am:proxy-22476."
Don't you mean prefix ?
"There are also 34 links to DBpedia."
Any reasons for such a low number?
Solicited review by Aba-Sah Dadzie:
The paper describes the generation of the Amsterdam Museum Linked Dataset, as part of the Europeana project, to make more accessible information about the collection and people related to the various objects and the museum as a whole. The Linked Data was created by editing a "crude" RDF dump, to, among others, ensure interoperability with other cultural heritage data and the Europeana Data Model.
A few examples of use of the source data are given, and the authors discuss the benefits that the conversion to Linked Data is expected to bring. A specific example is the creation of a mobile tour guide. An overview of the data structure and work to promote interoperability with domain-specific models and more general standards are detailed. The work reported includes web services for querying and URLs from which to browse the data. The authors note the need to periodically regenerate the dataset to capture changes in the source data.
I have a few reservations about this paper. While the authors provide a good amount of information about the dataset and the technology used to create it, it reads more like a project report that ticks off a list of deliverables than a description of the linked dataset and the design process followed - what it should be. I would suggest, where Europeana is first mentioned, that the authors give a brief description of the project as an introduction to its relation to the generation of the linked dataset (that it is a project the authors are involved with is not obvious till the end of the paper). This should make it easier to understand the impact on Europeana, based on lessons learnt from the process followed - this is to a large extent a pointer to potential reuse, further enrichment and maintenance of the Linked Data generated, that is wasted. Also, the authors conclude with future work on Europeana, not the Amsterdam Museum dataset.
A critical discussion of design choices and the knowledge modeling is missing. There is no comparison with related work - the four references are all self-citations pointing to more detail on specific aspects of the work reported. While I don't expect a detailed literature review in a short paper, obvious areas where a review could be carried out are the usage section, the description of the model used and how this relates to or is an improvement on other similar, domain-specific datasets (Linked Data or otherwise). I acknowledge other models and schemas are mentioned - but these simply refer to URLs to a project page or other rather than indicate why they constitute a good or balanced decision.
DETAILED REVIEW
p.1
"While larger cultural heritage institutions such as the German National Library or British National Library have the resources to produce their own Linked Data, metadata from smaller institutions is currently only being added through large-scale aggregators such as Europeana."
This statement is open to debate - I would amend it to something like " smaller institutions often depend on large-scale aggregators such as Europeana." AND back the claim with an appropriate citation.
"published it as "five-star" Linked Data" - "five-star" should be cited, using Berners-Lee's "Linked Data - Design Issues" article (http://www.w3.org/DesignIssues/LinkedData.html)
p/2
"The Amsterdam Museum Linked Data set implements best practices that -, the together with its methodology and tools- Europeana is keen on adopting for its future workflow."
Need evidence to back this claim. Also, is there something missing after the hyphen?
I don't understand the resource URI derivation. A suffix terminates a word; however the example given has "proxy" (the suffix) followed by a numerical code. A complete example might be useful here.
"We used purl.org URIs since for this conversion we were not in the position to use the Amsterdam Museum namespace for our Linked Data server." - why not?
It is not completely clear what the RDF relations and the conversion of the language information at the end of S2.1 are till later in the paper - forward-referencing (annotated sections of) Fig.2 would be useful here.
p.3
"In total the object metadata consists of 5,700,371 RDF triples of which many have a thesaurus concept or person resource as object." - How many is "many"? - the word is too vague to be meaningful.
"Two Amsterdam Museum classes am:Exhibition and am:Locat were defined as rdfs:subClassOf of the EDM class edm:Event."
Are these two classes particularly meaningful or are they simply meant as examples?
"Most term-based thesauri, including the AM thesaurus, have a more or less uniform structure (ISO 25964) making" - what does the ISO standard mean or refer to here?
p.5
"Linked Culture Data web" - is this referring to a particular initiative?
==============================
Figures & Tables
The caption of Fig. 1 is too long (especially compared to the figure content) - I would suggest working the description into the main text and providing a more concise caption.
Figure 2 is referenced in the text before Figure 1 - Figure 2 should be brought forward.
Figure 2 caption - "... with their super-properties and -classes in italics." Does "-classes" imply "SUPER-classes" - if so this must be explicitly written, a "-" works as a shortcut for suffixes, not prefixes.
Citations & Bibliography
Footnote 3 (URL) goes to an administrator login
Footnote 4 (URL) displays a tiny XML file (to do with diagnostics)
While http://purl.org/collections/nl/am IS redirected to http://semanticweb.cs.vu.nl/europeana, the resource itself is not found.
The 'Object Re-use and Exchange (ORE) model' should be cited properly, using whichever is most appropriate of the papers by Lagoze & Van de Sompel et al.
Language & Presentation
A small number of typos (not listed here) will be caught by an auto spell check.
All acronyms must be expanded at first use. This is especially important for those not in wide use. E.g.,
- OAI-PMH interface (p.2)
- RDA Group 2 metadata standard (p.2)
English uses "," as a 1000 delimiter - unlike some other languages which may use ".". Simply because this paper is in English it should use the "," convention. More importantly, usage should be consistent - this paper uses both. (p.2,4) This gets even more confusing on p.4 where "." is used in a sentence as a decimal point, and then also used as a 1000 delimiter.
p.3
"These properties are mapped to RDA Group 2 elements using 20 rdfs:subProperty relations were defined." -> "These properties are mapped to RDA Group 2 elements using 20 rdfs:subProperty relations."
p.5
"Where the current Linked Data pilot of Europeana (data.europeana.eu) focuses on producing a Linked Data set based on the already-ingested metadata consisting of a minimal set of Dublin Core properties."
This is not a sentence.
Solicited review by Philippe Cudre-Mauroux:
This short paper describes the Amsterdam Museum LOD. The paper starts by describing the modeling and conversion methodologies (basically, the metadata and vocabulary were exported from OAI-PMH as XML, converted to RDF, curated using rewriting rules, and finally interlinked). Mapping onto the Europeana Data Model (EDM) was carried out using subclasses and subproperties on the one hand, and a proxy-aggregation pattern (supporting EDM's multiplicity of providers by separating object metadata and provenance metadata) on the other hand. The resulting LOD is available online over HTTP (serving HTML/RDF-XML/Turtle by content negotiation) and as a GIT repository.
Overall, I liked the paper and found the description of the modeling/export process as well as the description of the data itself interesting and technically sound. The methodology used to produce the LOD is rather standard, but nevertheless compelling and gives a down-to-earth, pragmatic account of how to export local cultural data and map it to EDM. I inspected a number of entities online using the HTML interface and all the metadata I looked at appeared to me as correctly and precisely exported, hence supporting the claim of the authors (i.e., preserving the original richness of the data being LODified). The paper would in my opinion be stronger if it included i) more information about the linking process (e.g., high-level description of the Amalgame alignment platform, precision/recall results for the automatically created links) and ii) additional detail on the update procedure (how efficient/scalable is the overall export procedure? Is there any bottleneck? Would incremental updates be possible? etc.)