Instance-based Semantic Interoperability in the Cultural Heritage

Paper Title: 
Instance-based Semantic Interoperability in the Cultural Heritage
Shenghui Wang, Antoine Isaac, Stefan Schlobach, Lourens van der Meij, Balthasar Schopman
This paper gives a comprehensive overview over the problem of Semantic Interoperability in the Cultural Heritage domain, with a particular focus on solutions centered around extensional, i.e., instance-based, ontology matching methods. It presents three typical scenarios requiring interoperability, one with homogenous collections, one with heterogeneous collections, and one with multi-lingual collection. It discusses two different ways to evaluate potential alignments, one based on the application of re-indexing, one using a reference alignment. To these scenarios we apply extensional matching with different similarity measures which gives interesting insights. Finally, we firmly position our work in the Cultural Heritage context through an extensive discussion of the relevance for, and issues related to this specific field. The findings are as unspectacular as expected but nevertheless important: the provided methods can really improve interoperability in a number of important cases, but they are not universal solutions to all related problems. This paper will provide a solid foundation for any future work on Semantic Interoperability in the Cultural Heritage domain, in particular for anybody intending to apply extensional methods.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 

Review 1 by Giorgos Stoilos
After the changes that the authors have done in the resubmission I am happy with the paper, hence I propose it to be accepted.

Review 2 by Christophe Dupriez

Thank you for this very interesting paper!

This is a revised resubmission after an accept with major revisions (reviews below).

Review 1 by Christophe Dupriez

I like very much this article which brings practical insights to difficult "real life" matters.
* Article keywords can be "free" but some should come from controlled vocabularies to be normalized by the SW Journal (JITA and/or ACM CS 1998 for instance)
* Introduction:
** Introduction could emphasize more how lexical and extensional methods complements themselves.
** One could wish to understand the different cases happening to a manual "thesauri aligner" and which are handled lexically, which can be handled by analyzing co-occurrences and which cannot in either method.
* Scenarios:
** A table / figure or schema could better present each of the 3 scenarios
** Typology of vocabularies (classifications vs indexing), pre-coordination and concepts (name of "instance", name of groups, artificial groups) is not enough explained / taken into account in the development.
* Page 4, first column: commas missing in sentences of the two last paragraphs.
* Page 7, first column, last paragraph: acronym JSD is used before it is defined
* Page 8, first column, last paragraph: PMI should be more clearly presented as the acronym of Pointwise Mutual Information.
* IBOMbIE: What are the alternatives to Google Translation and why this one and not another? For keywords, IATE, DBPedia linguistic equivalences or other online dictionaries?
* An interesting development would be to see how this approach could help MACS (complement it, not compete it).
General remark: starting from a typology of thesauri semantical matching problems, this article could be derived into a seminal paper where the different problems could have different kind of potential automated helpers to manual operations.

Review 2 by Achille Felicetti

The technical analysis of the semantic interoperability presented in this paper is outstanding! Every single part of the problem is analyzed and described in great detail and also the solutions proposed are of very high quality, even if it could result quite difficult to understand, especially by non-technical (i.e. cultural heritage) scholars.
Anyway, the proposed methodology is of course valid and the scenarios are very realistic. Some more practical examples would have made the paper even more interesting than it already is.

Review 3 by Giorgos Stoilos

The paper presents techniques, and experiences about aligning Cultural Heritage collections. The techniques are based on instance-based methods for ontology alignment. Several issues for aligning CH collections are illustrated as well as an experimental evaluation.

I found the paper overall interesting and significant for the community of semantic interoperability over cultural data. Nevertheless, I have several issues regarding the presentation of the paper, especially wrt the meaning and English. In several places it was very difficult to parse sentences and understand the meaning as various articles where missing, tenses were mixed and phrases were cluttered by many sub-phrases that were interleaved with commas. (Several examples follow below). I believe that the paper should first be improved significantly wrt this aspect.

Moreover, I feel that the relations and differences of this subsumption wrt previous work published by the authors ([33,20,21]) should also be highlighted better since it is not very clear what is the advance wrt what has already been published. You should probably explain very briefly what the previous papers presented, e.g., in [33] we showed bla, ... .

More detailed comments:

- The "Extensional Methods paragraph": In page 2 col 2 paragraph of "Extensional methods for Semantic Interoperability..." you explain briefly what extensional based methods are. You repeat a similar brief explanation in at-least 3 other places in the paper (page 3 col 1, page 4 col 2, page 8 col 1). You should better avoid repeating things over and over again throughout the paper.

- Page 2 col 2: "This method has a number". Since before you refer to methodS for semantic interoperability you should use plural again, i.e., "These methods have" or better "Such methods have"

- What does ICT that you mention below stands for?

- I was confused about the content of the paragraph starting with "STITCH". It starts by presenting some project related application, but then it changes into a kind of related work on instance-based methods for ontology alignment.

- Page 3 col 2. "extensional methods in three above" -> "extensional methods in THE three above".

- Then you do not need to say again which are the three above-mentioned scenarios since you have mentioned them already.

- Same column in the contribution paragraph: You say "- described". You should not use past tense. You should say "it describes..."

- First paragraph of section 2: Again you repeat many things that have been stated before, as e.g., what this paper is about. It would be better to re-factor this paragraph

- "for example, in the MACS" -> "see for example, the MACS, ... ,... , projects"

- "that both provided targets for mapping". It is not clear what you mean by "targets"

- "Within the TELplus project some of the ideas where". Which ideas? The above?

- next paragraph "the specific problems ... has". -> "the specific problems ... HAVE"

- "The demonstrator of the MultimediaN"; who is the "demonstrator"?

- It was quite difficult to parse the sentence that starts with "the demonstrator". Did you mean to say "...WAS not just an excellent illustration of the potentialS of THE Semantic Interoperability, BUT has also triggered some..." .

- Next column there is a double "in", i.e., "mapping in in"

- Close to the end of col 2 in page 4 you say "many modern systems, such as [25]". This is quite strange as it implies that [25] is a system, but it rather is a citation to a paper. I noticed that you commonly use citations as nouns which sometimes looks quite strange. For example, in page 3 col 1 you say "[33] extend", or "[49,48] investigated". It would be better to say AthorX et al. [33] extend.

- last sentence in col 2 page 4: you claim there are not good preliminary reports for the general case of ontology matching, whereas before you mention that there are excellent overviews.

- Beginning of Section 3: "institutions opening up" -> "institutions ARE opening up"

- Next column: "Here, we classify three", better "In the following we present three"

- Section 3.1. I feel it would be quite good to give a couple of short specific examples regarding your three matching scenarios. For example, "book_1 is annotated as bla1 and book_2 as bla2 and we want to match book1 and book2"

- Last sentence of section 3.1.1. was also very hard to parse. I am still not sure I understand the meaning. Did you mean "in order to improve the interop. between these two collections and allow users to use both GTT and Birk to access the two collections, these thesauri need to be matched first"?

- First sentence section 3.2.1. You say "different CH institutions, even within the same CH institution". There is some inconsistency here, is it the same or different? Did you mean to say groups vs. institutions or institutions vs organisations?

- Last sentence of section 3.2.1. Again problem with meaning. Did you mean "Differently from the previous scenario, the differences of the metadata schemas should be taken into consideration in this matching scenario."

- Last sentence col 1 page 6: "one would be interested to search for broadcasts IN the BG, that are about the author of a book he is reading FROM the KB"

- Page 7 col 1: "This give the idea how" -> "This giveS AN idea how

- Next column: "While investigated mapping problem" -> "While investigated THE mapping problem"

- Page 8 col 1: You say "we deal with *this* problem by", but before you mention several problems.

- In the whole Section 5.1 it is not clear whether you present previous work or you present your work. In the latter case you should mention exactly how you used these measures. For example, later on you say "we can rank the pairs of concepts based on such measure". Which one? all of them? If there are more than one then how do you combine them to get one score?

- Section 5.2: It should be good to provide more details about your techniques, perhaps with a small example as well.

- Page 10 col 1: "Among 1m books whose subjects are annotation... 307K books are annotation ..." This is again repeated from earlier.

I would like to urge the authors to re-read and revise several parts in their paper to make them more simple and more understandable. The above are only the most notable and less comprehensible one, and this should not be considered as a complete list.