Reviewed

This category lists all reviewed submissions; for papers under review please visit the <a href='http://www.semantic-web-journal.net/category/tags/underreview'>under review papers section</a>.

Europeana Linked Open Data – data.europeana.eu

Paper Title: 
Europeana Linked Open Data – data.europeana.eu
Authors: 
Antoine Isaac, Bernhard Haslhofer
Abstract: 
Europeana is a single access point to millions of books, paintings, films, museum objects and archival records that have been digitized throughout Europe. The data.europeana.eu Linked Open Data pilot dataset contains open metadata on approximately 2.4 million texts, images, videos and sounds gathered by Europeana. All metadata are released under Creative Commons CC0 and therefore dedicated to the public domain. The metadata follow the Europeana Data Model and clients can access data either by dereferencing URIs, downloading data dumps, or executing SPARQL queries against the dataset. They can also follow the links to external linked data sources, such as the Swedish cultural heritage aggregator (SOCH), GeoNames, the GEMET thesaurus, or DBPedia. The latest dataset release has been published in February 2012.
Submission type: 

5

Responsible editor: 
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Revised manuscript after "accept with minor revisions" - now accepted. The reviews from the first round are below.

Solicited review by Francois Scharffe:

The paper presents the pilot Europeana dataset. The dataset is important, rich and complex. It is a pilot as lessons learnt will enable to revise the publication. The paper is well written and gives a good overview of the dataset structure.

two minor remarks:
- Section 2 it is said that semantic markup is available on Web pages. It would be good to cite the technology used for the markup RDFa ? schema.org ?
-dereferencable -> dereferenceable

Solicited review by Dave Kolas:

This paper describes a prototype Linked Data version of the Europeana dataset.

* Quality of the dataset

The Europeana data on the museum / library resources is aggregated from a number of holders of the physical resources, thus the original providers have motivation to make the data accurate. It is possible that the aggregation of many sources means that some sources produce different subsets of data for the schema. The schema addresses the problem of multiple potentially conflicting records about a resource with proxies. It is not clear whether this is a better or worse approach than reification or named graphs for this purpose, but it appears sufficient. The other schema modeling is reasonable, though light on the interlinking (as noted in the paper). The authors do a good job of linking to other datasets, though it would be interesting to see percentages as well as raw links.

* Usefulness (or potential usefulness) of the dataset

This dataset could be potentially useful to a large number of people involved in or interested in the arts in Europe. It could also be combined with travel applications to know where to see particular works of interest. The prototype nature of the dataset leaves out much of the content currently in the non-linked-data Europeana dataset, somewhat mitigating its utility for the moment.

* Clarity and completeness of the descriptions

The paper is written clearly and concisely. The main classes in the data model are described well, and there is a good diagram of how these classes interact. An example record with properties might have been nice however.

Solicited review by Amit Joshi:

The paper is about the Europeana linked open data which contains open metadata with more than 2.4 million text,images, videos and sounds related to books, paintings, films, museum objects and archival objects throughout Europe. Data is gathered by Europeana from multiple data providers. Metadata is obtained from data providers, formatted according to ESE XML Schema and then converted to EDM for generated linked data version. The dataset is live and can be accessed either by downloading data dumps or executing SPARQL queries against the dataset. The significance of such unique dataset being open is, without any doubt, high. However, the paper has following weaknesses:

1. Use of provider proxy and Europeana proxy is not clear. Is it even required?
2. It would be good to provide examples of the items/resources in a dataset that uses existing ontologies and connects to other LOD datasets.
3. Number of references is very few (only two). Please revisit earlier sections and add additional references (ex: linked data principles)

Description of the FAST (Faceted Application of Subject Terminology) Dataset

Paper Title: 
Description of the FAST (Faceted Application of Subject Terminology) Dataset
Authors: 
Edward T. O'Neill, Eric Childress
Abstract: 
FAST is a controlled vocabulary for names, topics, events, chronology, places and form/genre. It is derived from the Library of Congress Subject Headings (LCSH), a subject heading system widely deployed in metadata used by libraries in the English-speaking world to describe and facilitate the retrieval of library materials. FAST re-expresses the rich universe of LCSH headings as eight facets—each a standalone but complementary vocabulary—well-suited for supporting faceted navigation in information retrieval systems. With the exception of the Chronological facet, all of the headings in FAST are enumerated. FAST is published as Linked Open Data (LOD) and has been specifically designed to be compatible with Dublin Core and other widely-used metadata schemas.
Submission type: 

5

Responsible editor: 
Decision/Status: 
Reject and Resubmit
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Solicited review by Michiel Hildebrand:

This paper describes a controlled vocabulary that is composed out of the LCSH terms that are used in the WorldCat collection. This collection is interesting as LCSH is itself not only a set of concepts, but also prescribes a mechanism to create such terms. This collection, thus, contains all the terms created in the libraries part of the WorldCat collection. The dataset is maintained by a renowned institute and used at large scale.

The paper gives information about most of the aspects mentioned in the call
for papers (e.g. license, availability, topic coverage). However, it does so
in a very minimalistic way. The paper is difficult to read and requires significant background knowledge of the reader and additional research to understand the dataset and its value. For example, the description of the original sources LCSH and NACO is very limited. Therefore, it is not easy to understand what this dataset adds to the already available SKOS version of LCSH (http://id.loc.gov/authorities/subjects.html). Please, explain the relation to LCSH more clear and provide a comparison with this SKOS version of LCSH.

Furthermore, the paper lacks motivation of the design decisions. For example, the faceted classification appears to be a prime addition, but it is not motivated where the eight facets come from. Why are these useful and why only eight? The examples of the applications also do not illustrate the added value of the facets.

Solicited review by Francois Scharffe:

The dataset is a controlled vocabulary derived from the Library of Congress Subject Headings. As such, the dataset contains a large number of resources associated to the library domain. The paper describes various aspects showing the dataset is regularly updated with management of deprecated URIs. Less information is given on the vocabulary usage (only SKOS ?). We can regret that few links are created. Links to eg DBPedia and other datasets could easily be created.

Also some more additional motivation on creating FAST while LCSH datasets exists would be welcome (the motivation is clearer on the FAST web site).

Solicited review by Dave Kolas:

This paper describes FAST, a data set for faceted categorization of library resources.

* Quality of the dataset

Part of the dataset is derived from the Library of Congress, an authoritative data source on library categorization. It is then augmented and tuned by the paper writers. Given that this work is done in partnership with researchers at the Library of Congress, it is likely to be used correctly in this dataset.

Overall, the quality of this dataset in terms of provenance should be high. However, the paper does not describe the ontological terms used within the dataset, and thus it cannot be determined (from the paper, anyway) how well the data is modeled. From looking at the example in the paper, it appears that there is no subcategory scheme? This could be very useful.

* Usefulness (or potential usefulness) of the dataset

The dataset could be extremely useful to library science practitioners, as well as being used by online booksellers, etc. It is potentially linkable to indexes of scholarly articles as well.

* Clarity and completeness of the descriptions

The descriptions of the dataset lack details in a few key areas:

- The paper does not describe what RDF vocabularies are used. The URIs referenced section lists URLS relevant to the dataset, but there's no focused list of where vocabulary terms come from.
- The paper does not adequately describe the additional structure/utility of the facets. The facets are listed as added value over the LCSH dataset, but little description is used for this feature.
- The paper lacks an example or diagram of the key vocabulary terms.

On the positive side, the descriptions of the provenance and usage are quite good.

This paper is also not formatted in the correct format.

Overall this paper presents an interesting and potentially very useful dataset, but more could be done to clearly represent key points in the paper.

Hide Reviews: 
no

IDSWrapper: a Linked Data interface to the Institute for Development Studies’ data

Paper Title: 
IDSWrapper: a Linked Data interface to the Institute for Development Studies’ data
Authors: 
Christophe Guéret, Victor de Boer, Duncan Edwards, Timothy G. Davies
Abstract: 
This short paper provides a description of the IDS Wrapper used to expose the data from the Institute for Development Studies’ Knowledge Services as Linked Open Data. The IDS Wrapper provides Linked Data access to 35,000 research documents on development research as well as its medata. The IDS Wrapper links this metadata to a number of external sources: DBpedia, GeoNames, Lexvo and the IATI Linked Data set. We expect that the IDS data will play a central role in the larger web of Linked Data for global development.
Submission type: 

5

Responsible editor: 
Decision/Status: 
Major Revision
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Solicited review by Axel Polleres:

This paper presents a preliminary description of a dataset wrapping information from the Institute for Development studies.

The dateset is potentially interesting, but the authors make no clear attempt to showcase possible uses.

The quality of the dataset is not yet clear, particularly, there is no comment on the sustainability of the project, beyond the exercise of creation of a wrapper, even the URI (http:///idswrapper.appspot.com) is marked as preliminary,

As for usefulness (or potential usefulness) of the dataset, it would be nice if the authors gave some concrete examples, e.g. of possible applications and queries that they envision with this dataset.

As for clarity and completeness, the examples and figures of Section 3.3 all seem to be cut off in the PDF. There is not query-, browse-, or search functionality as of yet, it seems and when I go on the Webpage, I have no guidance how to navigate this dataset and find out how it could be useful for my purposes.

In summary, this work is a potentially useful project in a preliminary stage which is welcome, but probably too immature for the purposes of this special issue.

I recommemnd the authors to deploy it further and find adopters, seek feedback in workshops, etc. first and target a journal again when the dataset has proven usefulness in some sense.

Solicited review by Philippe Cudre-Mauroux:

This short paper describes a wrapper used to convert and interlink metadata from the Institute of Development Studies (IDS) into Linked Data. Overall, I found the paper interesting and well-written. More specifically, the architecture of the wrapper is compelling: it dynamically converts identifiers to linked data, calling the IDS REST API and creating links to further Linked Data resources on-the-fly taking advantage of the Java restlet package deployed on Google's AppEngine. Also, the authors give a nice overview of the state-of-the-art in international development APIs and data dumps. A few interesting points are mentioned in the paper but should be in my opinion discussed in more detail, namely: i) to what extend would it be possible to mine information directly from the text of the 53'000 research documents? ii) since the URI scheme is human-friendly (it includes the literals corresponding to the "collections"), why not adding the label of the resource itself as well? (having the label in the URI is imho useful in many situations) iii) it would be really interesting to have some information on the efficiency and effectiveness (e.g., precision and if possible recall) of the various linkage services and finally iv) why only develop propriety client applications based on this data? Wouldn't a SPARQL endpoint / RDF dump be possible? Please explain in the context of your project.

Solicited review by Norman Heino:

This paper describes a dataset about research results from development studies as made available by IDS (Institute for Development Studies).
The software component used to convert IDS data to RDF (called 'IDSWrapper') is also described in brief.
Like the original IDS data the converted set is about documents, organizations, categories, countries and regions of research focus.
The data is on the fly enriched with links from Lexvo, DBpedia, IATI, and Geonames which helps in understanding the data once found.

I found the data to be of medium to high quality.
Some values are obviously wrong or missing, but most of the data seems plausible.
What I particularly like is that some properties have been replaced by or linked to more common ones.
Replacement has been done for rdf:type and dcterms:language, while others like ids:date_created or ids:cat_parent have been linked via rdfs:subPropertyOf relations to Dublin Core, FOAF, or SKOS vocabularies.

Usefulness of the dataset is a bit hard to evaluate since it consists mainly of metadata about research articles.
The value obviously lies in the articles themselves and the dataset's raison d'être is making it easier to find those articles.
As the authors note their implementation lacks a search feature but the technical reason given is a bit unconvincing.
Why are search queries not just forwarded to the original search API?
The paper quickly mentions a client application that could be used to 'browse through the IDS documents'.
I would like the authors to elaborate a bit more on the potential that is gained through RDF here.
For instance since the categorization properties are derived from SKOS this would enable generic SKOS browsers to be able to browse articles by category hierarchy: something which is not possible through the IDS interface.

The paper is clearly written and provides a usage example as well as the bigger picture on supporting development practitioners.
Other than a few typographical errors and minor issues no editorial revision is needed.

Typos
-----
* Abstract: "as well as its medata" => "as well as its metadata"
* page 2, section 3, par. 1: "different type of entities" => "different types of entitites"
* page 2, section 3, par. 3: "and based on those to establish links" => "and based on those establish links"
* page 3, section 3.3.3, par. 1: "example of such link" => "example of such a link"
* page 4, par. 1: "by using applying their URI scheme": remove one of using, applying

Other issues
------------
* Why are Fig. 1--5 presented as screenshots? To improve presentational quality I suggest reaplacing them qith real tables.
* For standard namespaces you should use the standard prefixes as per recommendation (you do so for rdf but not for owl)
* The vocabulary still uses the "http://example.org#" namespace; an obvious oversight

Hide Reviews: 
no

Description of the VIAF (Virtual International Authority File) Dataset

Paper Title: 
Description of the VIAF (Virtual International Authority File) Dataset
Authors: 
Thomas B. Hickey, Jeffrey A. Young
Abstract: 
VIAF virtually combines multiple library authority files into a single name authority service. The system mines and clusters variant names for a given entity (chiefly, persons and organizations), links the corresponding source records, and assigns a URI to each cluster. The dataset is built using advanced algorithms developed by OCLC Research, a global leader in applied research related to library information, and is the product of an ongoing collaboration of OCLC and a group of national libraries, other leading libraries and other cultural heritage organizations. Available as Linked Open Data (LOD), VIAF is leveraged by freebase.com and an expanding array of other agencies and services.
Submission type: 

5

Responsible editor: 
Decision/Status: 
Reject and Resubmit
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Solicited review by Prateek Jain:

The work "Description of the VIAF Dataset" explains a dataset related to combining multiple library authority files into a single name authority service.

The dataset was generated by using the information sent or harvested by OCLC over a period of time. The metadata is extracted from the bib records and is merged into the authority records. The matching process resolves ambiguities and builds cluster from these records.

The dataset contains information about authors,works and biographical information. Considering there are numerous datasets in this field in similar spirit available as part of the LOD, the usefulness does not requires any justification. The key distinction of this dataset is the creation from library authority files and the process used.

Based on the guidelines provided on the CFP for this issue, my review is the following

* Description of the dataset

The work gives sufficient information about the name, URL, licensing and availability. It also talks about the topics covered. However except for details about page visits other statistics about interlinks and types of relationships defined are missing. The authors have given a good description of the known shortcoming of the datasets. However, I find it very strange to see not even a single RDF/OWL snippet of any key entities modeled in the paper. I might be missing something but I personally do not understand what is a library authority file and a name authority service. It will help if the paper will describe these fundamental concepts with some details. The details about matching process are also missing. I wonder if other matching tools like SILK and LIME can help with the process.

* Quality and usefulness of the dataset
The dataset is definitely useful as it deals general information about the authors and other literary works which are part of LOD.

I have a few questions though and would hope the authors address them subsequently.

Is it possible to provide more details about matching process?

What motivated the authors to create this dataset and use this source?

What practical problems are the authors planning on solving using this dataset?

What kind of additional relationships beyond owl:sameAs can be useful for this dataset? Can OWL based modeling help with the dataset?

What kind of modeling challenges were faced which were unprecedented?

Have the authors investigated using a tool like SILK for doing the linkage?

Does the dataset overlap with other existing datasets? If yes, which ones, what percentage. What is the unique aspect of this dataset?

* Clarity and completeness of the descriptions.

I am not sure, but I think the writing style and template used is more aligned with a white paper style. There is no reference to even a single research paper related to this area. While, I won't count it against the work, it is difficult to judge how it is related to other works due to the lack of it. I would strongly recommend shaping the work like a research paper and doing survey to identify related work and citing them.

Overall, I would say while the dataset can be useful, there are a lot of questions which have been left unanswered. I hope the authors can answer these questions which will make it high quality submission.

Solicited review by Christophe Gueret:

This paper presents an interesting data set, the VIAF published by OCLC.

* Quality of the dataset: good
The resources are accessible, the choice of vocabulary is sound and motivated in the paper.

* Usefulness (or potential usefulness) of the dataset: good
There is already a large base of users and also a number of harvesters. The data set is well connected to other sources and considering its size can play the role of a central naming authority for libraries.

* Clarity and completeness of the descriptions: average
The most suprising aspect of the paper is its layout. It is not in accordance to the publishing guidelines of the journal and sometimes too factual. The authors could add more text to better explain what is the role of the clusters, for instance. I was also wondering what happen to the previous versions of the RDF when a new version is made available (after a change in data modeling or an update). It seems that no versioning is done and that the new content just replaces the old one, but it would be best if that would be clarified in the text.

Solicited review by Emanuele Della Valle:

The authors present, in a schematic form, a dataset that combines multiple library authority files. The data set is of significant size, and it is externally linked to DBpedia using owl:sameas and skis:exactMatch. The Web APIs are intensively used. The Web browser interface has a large number of users. RDF views are available. The presentation only lacks the description of the vocabulary and an example of how to use the links with DBpedia.

The authors, in order to have their dataset description accepted, have to turn the current schematic description into a narrative. In doing so, I recommend them to add a description of the vocabulary used in the dataset and an example of usage of VIAF together with DBpedia.

Hide Reviews: 
no

Amsterdam Museum Linked Open Data

Paper Title: 
Amsterdam Museum Linked Open Data
Authors: 
Victor de Boer, Jan Wielemaker, Judith van Gent, Marijke Oosterbroek, Michiel Hildebrand, Antoine Isaac, Jacco van Ossenbruggen, Guus Schreiber
Abstract: 
In this document we describe the Amsterdam Museum Linked Open Data set. The dataset is a five-star Linked Data representation and comprises the entire collection of the Amsterdam Museum consisting of more than 70.000 object descriptions. Furthermore, the institution’s thesaurus and person authority files used in the object metadata are included in the Linked Data set. The data is mapped to the Europeana Data Model, utilizing Dublin Core, SKOS, RDA-group2 elements and the OAI-ORE model to represent the museum data. Vocabulary concepts are mapped to Geonames and DBpedia. The two main contributions of this dataset are the inclusion of internal vocabularies and the fact that the complexity of the original dataset is retained.
Submission type: 

5

Responsible editor: 
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Revised manuscript after an accept pending major revisions, now accepted for publication. The reviews of the original submission are beneath the second round reviews.

Second round reviews:

Solicited review by Aba-Sah Dadzie:

The authors have done a good job of addressing the review comments. I'd recommend acceptance for the special issue, with a few minor corrections/additions. With regard to the specific requirements for this call:

* Quality of the dataset - this is well described and pointers to sample queries allow the reader to directly access the linked data.

* Usefulness (or potential usefulness) of the dataset - clearly contributes to the arts and cultural heritage. Further work planned by the authors indicates potential for further use and added value enabled by the conversion to LD.

* Clarity and completeness of the descriptions - the revised paper addresses concerns expressed in the 1st review. The use of existing standards and extensions to these are well described and referenced. The process followed in generating the dataset and the overall aims are also more clearly described.

* Name, URL, version date and number, licensing, availability, etc.
Version information and licensing information missing

* Topic coverage, source for the data, purpose and method of creation and maintenance, reported usage etc.
* Metrics and statistics on external and internal connectivity, use of established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language expressivity, growth.
* Examples and critical discussion of typical knowledge modeling patterns used.
- Addressed in sufficient detail

* Known shortcomings of the dataset. - fairly well addressed, but see point below.

_____________________

Additional points to address

p.2 - "Although this approach ensures a level of consistency and interoperability between the datasets from different institutions it creates a disconnect between the cultural heritage institute original metadata model and the Linked Data version."
This begs the question "why"? Also, this appears to be contradicted in the final paragraph in this section.

p.5 - " Finally 34 persons were linked to persons in DBpedia. This is a relatively low number as 1) most of the Amsterdam Museum people are not notorious enough to appear in DBPedia" - do you need to be "notorious" to appear in DBPedia? "Famous" or "noted", maybe, but notoriety is normally considered to be negative or at best not complimentary.

Language & Presentation

Generally well written but needs a spelling & grammar check and proofread for minor errors. Among others,

p.4 - "proxies-aggregation" -> "proxy-aggregation"

Formatting of URLs

Some of the URLs them break because the formatting is splitting them and/or inserting whitespace at delimiters - requires the reader to copy the full URL and delete the white space inserted to reach the intended address.

Solicited review by Philippe Cudre-Mauroux:

This second iteration corrects the minor flaws of the first version of the paper (which I already liked actually). From my perspective, this paper is ready for publication.

Solicited review by Fabien Gandon:

my previous comments were addressed.

First round reviews:

Solicited review by Fabien Gandon:

This paper presents the Amsterdam Museum Linked Open Data set
The access point, content, metrics, statistics, modeling rationale, etc. are provided by the paper, which is in my opinion a very good contribution to this CFP.

"with suffix 'proxy-', 'aggregation-', 't-' or 'p-' for proxies, aggregations, concepts and persons respectively (eg. am:proxy-22476."
Don't you mean prefix ?

"There are also 34 links to DBpedia."
Any reasons for such a low number?

Solicited review by Aba-Sah Dadzie:

The paper describes the generation of the Amsterdam Museum Linked Dataset, as part of the Europeana project, to make more accessible information about the collection and people related to the various objects and the museum as a whole. The Linked Data was created by editing a "crude" RDF dump, to, among others, ensure interoperability with other cultural heritage data and the Europeana Data Model.

A few examples of use of the source data are given, and the authors discuss the benefits that the conversion to Linked Data is expected to bring. A specific example is the creation of a mobile tour guide. An overview of the data structure and work to promote interoperability with domain-specific models and more general standards are detailed. The work reported includes web services for querying and URLs from which to browse the data. The authors note the need to periodically regenerate the dataset to capture changes in the source data.

I have a few reservations about this paper. While the authors provide a good amount of information about the dataset and the technology used to create it, it reads more like a project report that ticks off a list of deliverables than a description of the linked dataset and the design process followed - what it should be. I would suggest, where Europeana is first mentioned, that the authors give a brief description of the project as an introduction to its relation to the generation of the linked dataset (that it is a project the authors are involved with is not obvious till the end of the paper). This should make it easier to understand the impact on Europeana, based on lessons learnt from the process followed - this is to a large extent a pointer to potential reuse, further enrichment and maintenance of the Linked Data generated, that is wasted. Also, the authors conclude with future work on Europeana, not the Amsterdam Museum dataset.

A critical discussion of design choices and the knowledge modeling is missing. There is no comparison with related work - the four references are all self-citations pointing to more detail on specific aspects of the work reported. While I don't expect a detailed literature review in a short paper, obvious areas where a review could be carried out are the usage section, the description of the model used and how this relates to or is an improvement on other similar, domain-specific datasets (Linked Data or otherwise). I acknowledge other models and schemas are mentioned - but these simply refer to URLs to a project page or other rather than indicate why they constitute a good or balanced decision.

DETAILED REVIEW

p.1
"While larger cultural heritage institutions such as the German National Library or British National Library have the resources to produce their own Linked Data, metadata from smaller institutions is currently only being added through large-scale aggregators such as Europeana."
This statement is open to debate - I would amend it to something like " smaller institutions often depend on large-scale aggregators such as Europeana." AND back the claim with an appropriate citation.

"published it as "five-star" Linked Data" - "five-star" should be cited, using Berners-Lee's "Linked Data - Design Issues" article (http://www.w3.org/DesignIssues/LinkedData.html)

p/2
"The Amsterdam Museum Linked Data set implements best practices that -, the together with its methodology and tools- Europeana is keen on adopting for its future workflow."
Need evidence to back this claim. Also, is there something missing after the hyphen?

I don't understand the resource URI derivation. A suffix terminates a word; however the example given has "proxy" (the suffix) followed by a numerical code. A complete example might be useful here.

"We used purl.org URIs since for this conversion we were not in the position to use the Amsterdam Museum namespace for our Linked Data server." - why not?

It is not completely clear what the RDF relations and the conversion of the language information at the end of S2.1 are till later in the paper - forward-referencing (annotated sections of) Fig.2 would be useful here.

p.3
"In total the object metadata consists of 5,700,371 RDF triples of which many have a thesaurus concept or person resource as object." - How many is "many"? - the word is too vague to be meaningful.

"Two Amsterdam Museum classes am:Exhibition and am:Locat were defined as rdfs:subClassOf of the EDM class edm:Event."
Are these two classes particularly meaningful or are they simply meant as examples?

"Most term-based thesauri, including the AM thesaurus, have a more or less uniform structure (ISO 25964) making" - what does the ISO standard mean or refer to here?

p.5
"Linked Culture Data web" - is this referring to a particular initiative?

==============================

Figures & Tables

The caption of Fig. 1 is too long (especially compared to the figure content) - I would suggest working the description into the main text and providing a more concise caption.

Figure 2 is referenced in the text before Figure 1 - Figure 2 should be brought forward.

Figure 2 caption - "... with their super-properties and -classes in italics." Does "-classes" imply "SUPER-classes" - if so this must be explicitly written, a "-" works as a shortcut for suffixes, not prefixes.

Citations & Bibliography

Footnote 3 (URL) goes to an administrator login
Footnote 4 (URL) displays a tiny XML file (to do with diagnostics)

While http://purl.org/collections/nl/am IS redirected to http://semanticweb.cs.vu.nl/europeana, the resource itself is not found.

The 'Object Re-use and Exchange (ORE) model' should be cited properly, using whichever is most appropriate of the papers by Lagoze & Van de Sompel et al.

Language & Presentation

A small number of typos (not listed here) will be caught by an auto spell check.

All acronyms must be expanded at first use. This is especially important for those not in wide use. E.g.,
- OAI-PMH interface (p.2)
- RDA Group 2 metadata standard (p.2)

English uses "," as a 1000 delimiter - unlike some other languages which may use ".". Simply because this paper is in English it should use the "," convention. More importantly, usage should be consistent - this paper uses both. (p.2,4) This gets even more confusing on p.4 where "." is used in a sentence as a decimal point, and then also used as a 1000 delimiter.

p.3
"These properties are mapped to RDA Group 2 elements using 20 rdfs:subProperty relations were defined." -> "These properties are mapped to RDA Group 2 elements using 20 rdfs:subProperty relations."

p.5
"Where the current Linked Data pilot of Europeana (data.europeana.eu) focuses on producing a Linked Data set based on the already-ingested metadata consisting of a minimal set of Dublin Core properties."
This is not a sentence.

Solicited review by Philippe Cudre-Mauroux:

This short paper describes the Amsterdam Museum LOD. The paper starts by describing the modeling and conversion methodologies (basically, the metadata and vocabulary were exported from OAI-PMH as XML, converted to RDF, curated using rewriting rules, and finally interlinked). Mapping onto the Europeana Data Model (EDM) was carried out using subclasses and subproperties on the one hand, and a proxy-aggregation pattern (supporting EDM's multiplicity of providers by separating object metadata and provenance metadata) on the other hand. The resulting LOD is available online over HTTP (serving HTML/RDF-XML/Turtle by content negotiation) and as a GIT repository.

Overall, I liked the paper and found the description of the modeling/export process as well as the description of the data itself interesting and technically sound. The methodology used to produce the LOD is rather standard, but nevertheless compelling and gives a down-to-earth, pragmatic account of how to export local cultural data and map it to EDM. I inspected a number of entities online using the HTML interface and all the metadata I looked at appeared to me as correctly and precisely exported, hence supporting the claim of the authors (i.e., preserving the original richness of the data being LODified). The paper would in my opinion be stronger if it included i) more information about the linking process (e.g., high-level description of the Amalgame alignment platform, precision/recall results for the automatically created links) and ii) additional detail on the update procedure (how efficient/scalable is the overall export procedure? Is there any bottleneck? Would incremental updates be possible? etc.)

Hide Reviews: 
no

Pages