A Systemic Approach for Effective Semantic Access to Cultural Content
Manuscript revision (now accepted) after an accept with minor revisions. Previous reviews below.
Review 1 by anonymous reviewer
The paper is really improved.
Here are some remarks for the new version:
Although this paper explores semantic access, the related work mostly deals with metadata schemas.
Still, there are tools that use semantic information, like Powerset, CatScan, Shallow Semantic Query,
Even with limited data, the proposed approach needs a lot of execution time.
Are there any plans or thoughts on how to make this more scalable, for Europeana sized collections?
The evaluation deals mostly with performance / response time, and not with the evaluation of the output results:
A recall / precision measure should be estimated, by examing the results, to express how well this approach works.
Or the results should be compared with results from another approach.
Some related tools (see above) often use pre-computed semantic information, and this makes them run faster, and more scalable.
But how can we compare the results of the alternative approaches?
Review 2 by anonymous reviewer
I have now checked the paper and responses to reviewers, and my assessment is that the paper has improved much, especially the evaluation part that was my main concern. So from my side there is no objection to proceed with accepting the paper.
Meta review by anonymous reviewer
I was asked to provide a meta review of this paper. For this I read both the paper, all the reviewer comments and also the document from the authors where they explain how they have addressed the reviewer comments.
To me it seems that the most crucial issue that reviewers addressed was related to the novelty of the query rewriting and further on to the lack evaluation of the approach. Although authors provided some sort of evaluation in the new version it looks like an artificial one without real involvement of any human test subjects, and e.g. precision-recall analysis. Comparing running times of the queries is not perhaps the kind of evaluation that reviewers asked for. Taking this, I would vote for reject & resubmitting after a major revision.
However, I had a look at the other papers of this special issue and for me it seems that many of them also lack any proper evaluation. For example the paper by Mäkelä and Hyvönen does not have any. So if editors have accepted papers that do not include any evaluation then they might consider including the one by Kollia et. al as well. But this is something I leave to the editors to decide.
This is a revised submission after a "reject and encourage resubmission." The reviews below are for the original submission.
Review 1 by Sarantos Kapidakis
The paper briefly presents a tool that helps the user define metadata mappings for harvesting in Europeana, and proposes a semantic framework for its use.
The tool presentation is short, and it is not clear how it handles n:m field mappings, obligatory europeana elements and input validation.
The description of the proposed framework is lengthy although not implemented, but in general explains the Europeana current plans and efforts, and gives some examples for the implementation of some parts of it. The query evaluation examples use known optimizations, and the other examples cover only small simple cases, with a small dataset and classification hierarchies.
A useful proposed approach for Europeana should be able to handle larger collections: from one library alone we could get millions of records, thematically mapped to DDC or LC. Many of the descriptions serve more like a wish-list, as it is not clear how they can be implemented in the real environment.
The ideas behind the paper and proposal are good, but we need to see them implemented, to see how they work on real data and even to measure precision and recall (or other metrics) on its usage.
Review 2 by Werner Kuhn
"This paper presents a system..." is a phrase that raises concerns in an abstract: can we learn something from such a presentation and is the system innovative? I am afraid I cannot answer either of these questions positively. While it is obviously very important to solve semantic interoperability problems in the context of cultural heritage information, the paper fails to identify specific unsolved problems and does not make it clear what its contributions are in terms of semantic web methods (rather than possibly useful tools for cultural heritage communities).
If a system is the main focus of a paper, one needs to show at least what parts of it are completed and what results they have produced. The paper also fails on this account and actually raises more questions about the state of implementation than it answers. There is no evaluation at all, just a rather superficial section describing an "experimental study" that is largely a thought experiment on how the system might operate and be useful in the future. No data, no evaluation, no user studies.
The only contributions to methodology could be the query expansion and optimization techniques, but these are to the best of my knowledge rather standard and not innovative.
The paper also has some language problems, but these are irrelevant given the above remarks. The images in the paper are nice, but really have nothing at all to do with its contents, only with the content of the information treated.
Review 3 by Rainer Simon
The paper presents a software system for mapping institutional metadata schemas to a unifying schema - the Europeana Data Model - in a user-guided process. The paper furthermore discusses how query answering can help in the process of metadata enrichment.
Overall, the paper is well written. The mapping tool is likely a valuable achievement w.r.t. establishing semantic interoperability in the cultural heritage field. Furthermore, it appears the tool is already being used on a large scale, which is impressive.
My one point of criticism, however, is that the paper slightly lacks a clear focus and "storyline", which makes it difficult to read. On the one hand, the paper describes the tool (which would certainly deserve a dedicated paper on its own right), but only relatively briefly. On the other hand, the paper talks about the "metadata enrichment by query answering" approach. It is suggested that this is somehow part of the overall mapping workflow, but I fail to see how exactly this is integrated, or whether it is integrated in the tool at all. An additional screenshot might help.
Likewise, the idea of resource linking done by (amateur) users as part of creating "stories" for personal use is fascinating. But it is not clear how this relates to the rest of the paper. Sure, semantic interoperability, search and query answering are all, apparently, "technological underpinnings" for this. But considering the paper's initial focus on the metadata mapping software, this seems like a large step from one topic to the other...
Also, the paper lacks a dedicated related work section. This might be useful in helping the reader to better put things into perspective.
Review 4 by anonymous reviewer
The paper is describing a metadata mapping tool plus some cases of semantic search applied to the domain of cultural content. The paper is dealing with semantics and it nicely fits both the journal scope and the contents of the call.
In general terms, the paper is reporting in some pressing practical issues related to metadata integration, and then focusing on the possibilities of ontology-based search exploiting formal semantics. In consequence the topics of the paper are highly relevant to the field. However, I have major concerns with the contributions provided and their degree of maturity and innovativeness. In what follows I detail my major concerns:
(1) The first contribution is contained in the first part of Section 2. It first described the existing ESE/EDM for Europeana. Then it reports on the integration system, succintly describing technology (XML/XSl) then visualization (Figure 1), and then workflow issues (Fig. 2 and 3). A review of the innovative issues is the following:
* The technical issues and visualization are mainstream and provide no real innovation over existing metadata editors and schema mapping tools, or the authors have failed to highlight which of the features of the tool that can not be found in other tools. The authors claim user friendlines of the tool, but this is not evaluated or analyzed from a scientific viewpoint in the paper.
* Workflow issues are interesting as some kind of "best practice" but they do not constitute a substantial progress over the state of the practice. Also, they are not subject to further analysis or evaluation later so they can be considered mostly background information on the usage of the solution.
(2) The main research content comes in the last paragraphs of Section 2 and Sections 3, 4 and 5. In Section 2, interesting problems associated to performance querying using formal ontologies and OWL. Section 4 describes the query answering implementation reused (from Oxford) and deals with some high level aspects in 4.1. that in my opinion can be removed as they are not directly relevant to the query problems analyzed. Then, Section 5 provides some example queries and response and some brief discussion. The problem with this contribution is that the status is still preliminary, and there is no real experiment, but a report on some examples. More work is needed and substantial representative sample queries with details measurements to come up with credible conclusions. Also, the authors are not clearly analyzing if performance is dependant on the implementation and content base structure or on the ways queries are resolved only. Both aspects need to be considering when assessing running time for queries.
It seems that contributions (1) and (2) are disconnected, as the mapping tool is for standard XML metadata and not highly expressive OWL.
In conclusion, the paper is addressing relevant formal query issues but:
- It mixes with a presentation of a mapping tool that can be removed or integrated in the introduction.
- Does not report substantial empirical data that constitutes a real advance. The experimental method is limited to some example queries and comments on anecdotal evidence on query results on a particular content database.
My overall suggestion is reorienting the paper to the query problems and rewritting it to expose that as the main contribution, and doing additional work in the experimental methodology and extensive analysis of empirical results for the query problems posed. I encourage the authors to do so as there is very limited literature in these topics.
Some additional comments:
- Fig. 4 and discussion on lined data can be romoved as it is non relevant to the main research issues.
- An explanation of the ontologies used in the queries is required and the detail on how LIDO metadata in the database is translated to that richer OWL representation, to understand if it is 100% automatic or requires enrichment.
- The relation of Europeana EDM to the OWL based querying system must be clarified, as the implementation of the experiments is done with other existing systems that are not operating on the representations used by Europeana.