Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

FRBR-ML: A FRBR-based Framework for Semantic Interoperability

Submitted by Krzysztof Janowicz on 02/19/2011 - 16:00

Paper Title:

Authors:

Naimdjon Takhirov, Trond Aalberg, Fabien Duchateau, and Maja Žumer

Abstract:

Metadata related to cultural items such as literature, music and movies is a valuable resource that is currently exploited in many applications and services based on semantic web technologies. A vast amount of such information has been created by memory institutions in the last decades using different standard or ad hoc schemas, and a main challenge is to make this legacy data accessible as reusable semantic data. On one hand, this is a syntactic problem that can be solved by transforming to formats that are compatible with the tools and services used for semantic aware services. On the other hand, this is a semantic problem. Simply transforming from one format to another does not automatically enable semantic interoperability and legacy data often needs to be reinterpreted as well as transformed. The conceptual model in the Functional Requirements for Bibliographic Records, initially developed as a conceptual framework for library standards and systems, is a major step towards a shared semantic model of the products of artistic and intellectual endeavor of mankind. The model is generally accepted as sufficiently generic to serve as a conceptual framework for a broad range of cultural heritage metadata. Unfortunately, the existing large body of legacy data makes a transition to this model difficult. For instance, most bibliographic data is still only available in various MARC-based formats which is hard to render into reusable and meaningful semantic data. Making legacy bibliographic data accessible as semantic data is a complex problem that includes interpreting and transforming the information. In this article we present our work on transforming and enhancing legacy bibliographic information into a representation where the structure and semantics of the FRBR model is explicit.

Full PDF Version:

swj161.pdf

Submission type:

Full Paper

Responsible editor:

Eero Hyvonen

Decision/Status:

Reviews:

Review 1 by Kate Byrne

In my opinion the paper has been greatly improved and I congratulate the authors. It now seems entirely suitable for publication.

This is a revised resubmission, after an "accept with major revisions".

Review 1 by Kate Byrne

This is a very detailed paper with some interesting results, but I felt that the content could have been expressed more succinctly, in fewer than the 24 pages used. More tailoring to the likely interests of readers would be welcome, and some restructuring would make the ideas easier to follow.

The research concerns bibliographic data, in MARC format, and the experimental results deal with a use case in which MARC records are "round-tripped", ie converted to a format based on FRBR, cleaned and enhanced with new information, and then converted back to MARC. My concerns with the paper are that the target audience is a little unclear and that the quite complex structure of the paper is in many places back to front: unfamiliar ideas are mentioned in early sections and not explained until further on. This makes the paper quite hard going to read, until one reaches the final two sections when things become much clearer. I think the authors could do themselves a service by restructuring the content.

Regarding the audience for the paper: as it stands, this work is clearly most likely to be of interest to librarians managing bibliographic catalogues where enhancing data quality is an important issue. But the emphasis on formal, set-theoretic notation may be unappealing to this target readership. Also, it's not clear that a technique that seems heavily based on straight sting "string equivalence" matching will advance the present state of the art. If a wider semantic web research audience is intended, it is a pity that the "semantic interoperability" promised in the title becomes rather lost amongst the details of the methodology used.

There are a number of places where terminology and concepts that will only be accessible to specialists could usefully be explained further - or rather, where the position of the explanation could be brought forward to earlier in the paper. In several places practical examples would help to illustrate meaning. The MARC format is introduced in section 2.2 and a sample record would help readers unfamiliar with its layout; perhaps just a reference forwards to Figure 3. The term "added entries" is used several times from section 2.3 onwards, but not explained until section 4 (page 9). Section 3.1 is particularly involved: a dense comparison of MARC with FRBR that would be much easier to follow through a worked example. The abbreviation "LOD" (not spelled out) is introduced ahead of its explanatory gloss further on in the text (section 6.2). The intricate descriptions in sections 4, 5 and 6 become much clearer when one reaches section 8, so some re-ordering would help.

Section 2 gives useful background about bibliographic formats, though I'd have expected some reference to W3C initiatives and the OMG (Open Metadata Group) work on ISBD XML schemas, and IFLA work on FRBR definition using SKOS. (I'm not a specialist in the bibliographic field so my grasp of which developments are significant may be flawed.) The diagram in section 2 (Fig 1) is very helpful, though I wondered why the arrows were double-headed rather than directed, eg "is created by" is not a symmetric relation (nor of course is its inverse). The entity terms introduced here - Work, Expression, Manifestation etc - should be italicised or given a special font throughout the text to avoid confusion.

I felt the information given in sections 2 and 3 could be abbreviated (likewise section 5), especially if the paper is intended to appeal beyond the library world. It was a bit dispiriting to reach page 8 before getting to a heading (section 4) of "Preliminaries". It's a small point, but one or two sections could do with more expressive headings, to help navigation; I found myself referring back and forth a lot, and smiling wryly to find what I needed at one point in the rather complicated section entitled "Simplicity and Understandability".

Personally I didn't find the formal notation introduced in section 4 helpful - it slowed me down rather than the opposite. At best it just restates what can be expressed more simply in natural language, and in some places it adds confusion. For example, I was unclear whether Cdiamond and Ddiamond (I only have ASCII characters available here) are genuinely subsets of C and D as stated on page 10, or whether the sub/super-set arrangement can be either way round as stated earlier. In each case (MARC, FRBRizer, FRBR-ML) the relationships between R, C, D, S etc are identical, so it didn't seem to me that useful information was being summarised, which is the value of formal notation. The interesting mappings, that are the core of the paper, are *between* these separate formats, ie r to rdiamond, C to Cdiamond etc, but these mappings are not given. The "map_d()" function (as opposed to "map_s()") is defined as taking only a MARC datafield tag label (eg "100" or "240") as argument, and then shown taking a tag-value pair (page 12). It is shown as evaluating to "mu", which is not defined in Table 1. To take just one example amongst many that seemed obfuscatory rather than helpful, the statement "In FRBR-ML, a FRBR entity f* in F* is related to another one with a relationship l* in L* such that l*: f* x f*." could be expressed as "FRBR entities can be linked by relationships". The link at footnote 8 on page 12 - to the FRBR-ML schema - seems to be broken.

The distinction between the "hierarchical" and "reference" methods in section 5 does not seem important enough to justify the effort around it. The algorithm for deciding which format to use is nicely set out but is this step really necessary? Why not simply use the "referencing" method throughout and drop the complication of "hybrid representation"?

The most important part of the paper is introduced in section 6, namely the attempt to enhance or correct input records by matching entity mentions against other records in the input dataset or other library catalogues, or against external sources such as DBpedia and Freebase. This final step is the contribution of this work, to my mind. The steps are to find entities in the input data, categorise them (as "person", say) and then attempt to deduce a relation to another entity in the input. The use case describes transforming an unspecified relation between "Hans-Joachim Maass" and a bibliographic record into a more specific relation: an "is realised by" connection from the Expression entity. In fact Hans-Joachim is the translator, but this relation cannot be explicitly expressed in the original MARC, nor in FRBR-ML. However FRBR-ML is able to insert the missing relation with Expression, and to specify which translator goes with which instance of Expression.

A set of metrics is given in section 7. The "completeness" measure calculates how different the input and output fields are - so one could trivially get a perfect score by a null transformation of the input. Section 7.1 uses MARC control fields as an example but it's not clear why these would be changed, and the paper does not describe control field mapping. Once again the formal notation fails to convey additional information; for example, the comp_f() function defined on page 18 shows that input and output lines of a MARC record (tags, subcodes and values) are going to be compared, but doesn't tell us how - I assume a string comparison, perhaps implemented through the hashing mentioned earlier in the paper. The "redundancy" measure considers the amount of duplication present (exact string equivalence again) - but surely from a practical point of view such duplication should simply be eliminated. Why bother with an intermediate step of measuring it? The third metric is "extension" and, as specified in the expressions in section 7.3, it is simply a measure of *how much* has been added - so one could trivially get an arbitrarily high score by adding characters in the output. Surely the only important evaluation measure is on the *semantic* content - whether what has been added is correct.

Section 8 is the best part of the paper, where a good deal of the earlier material falls into place, and where a genuinely useful evaluation is described. It's only on page 20 that we reach commentary on whether the enrichment process has produced better bibliographic records or not. Since this evaluation, by 8 human judges, seems the core result I would have welcomed more detail on the process. Measuring precision is feasible using experts (ie is this data correct?) but measuring recall (is any information missing?) is notoriously difficult in open-ended tasks of this nature, and the precise method used would be of interest. The remark "...presented the top three candidate matches...including a manual search on the knowledge bases for the entry value when needed..." is particularly intriguing. I would have preferred numbers in Figure 9 instead of a bar chart, but the scores seem remarkably high for what is a difficult knowledge enrichment problem. I would encourage the authors to restructure the paper around these results and drop some of the methodological details (that I've spent far too long on myself, in this review). Right at the end of section 9 there is a hint that the authors plan to move on to a wider interpretation of co-reference resolution and grounding against external authorities, using pattern matching that goes beyond exact equality, which indeed seems the obvious next step.

I noted a number of minor typographical and similar errors, and can supply a list if required.

Review 2 by Sarantos Kapidakis

This is a solid paper, that presents how to handle FRBR and MARC, in order to add semantic information to the FRBRized data. The FRBRized data, with the extended information, can also be converted back to MARC (for the legacy systems), or be used in other XML based formats (for more modern applications).
The described system can communicate with other sources of additional information, and is using well trimmed heuristics to extract the appropriate information when it is needed.
The authors provide a formal model,they explain their data layout,
they make an experimental evaluation on collections of the Norwegian National Library, and they also estimate evaluation metrics.

Review 3 by Ray Larson

Bibliographic Records specifications. Overall, this is a very well-written and informative paper about the issues and potential for transforming conventional MARC bibliographic data to FRBR-compliance XML using the
FRBR-ML framework. I found a few minor errors in grammar or spelling,
but otherwise most of my comments are concerning the structure of
the FRBR-ML records themselves.

First the grammatical issues - In section 2.1 "that constitutes a …"
should be "that constitute a …"
later in the same section "use of punctuations and separators characters"
should be "use of punctuation and separator characters"
and "that are users need to search" should be something like "that are needed by users for searching"

My major issue with the paper is more of a question about whether the
"hierarchical Method" described sort of turns the whole notion of FRBR
upside down. One of the characteristics of FRBR that was most revolutionary was the adoption of the Work instead of the manifestation as the primary unit of organization. But in the hierarchical method you have made
the manifestation the top-level element with the work nested within. This is naturally a better fit to the philosophy of the source MARC data, but
I would argue that it sacrifices the benefits of the FRBR stucture
(such as grouping all expressions and manifestations of a work under
that work, instead of vice-versa). Although the paper points out that
there are drawbacks to this method (and to the referencing method as
well) I think that could be made a little more forcefully.

But, overall this is an interesting and useful paper about FRBR and its instantiation in FRBR-ML.

Tags:

Reviewed

Log in or register to post comments
20504 reads

Comments

RE: review comments

Permalink Submitted by Naimdjon Takhirov on 05/20/2011 - 04:45.

Reviewer: Kate Byrne
------------------
Comment:
I felt that the content could have been expressed more succinctly, in fewer than the 24 pages used. More tailoring to the likely interests of readers would be welcome, and some restructuring would make the ideas easier to follow. My concerns with the paper are that the target audience is a little unclear and that the quite complex structure of the paper is in many places back to front: unfamiliar ideas are mentioned in early sections and not explained until further on. This makes the paper quite hard going to read, until one reaches the final two sections when things become much clearer. I think the authors could do themselves a service by restructuring the content.

Response:
We have reduced sections “Background” and “Related Work”. Additionally, we have simplified the formal model and reduced the number of pages. But incorporating improvements based on reviewers' comments resulted in a 20 pages long article.
We have made changes to tailor the content to a general audience including libraries and the semantic web community for this Special Issue on “Semantic Web and Reasoning for Cultural Heritage and Digital Libraries”.
--------

Comment:
Regarding the audience for the paper: as it stands, this work is clearly most likely to be of interest to librarians managing bibliographic catalogues where enhancing data quality is an important issue.

Response:
We have chosen to apply our framework to library records: exchange format, enrichment and correction and quality evaluation. However, a similar solution can be applied to other domains. For example, biologists also need to add semantics to their data while still maintaining high quality data.
--------

Comment:
If a wider semantic web research audience is intended, it is a pity that the "semantic interoperability" promised in the title becomes rather lost amongst the details of the methodology used.

Response:
The whole paper is about enabling semantic interoperability:

an exchange format in FRBR-ML
the semantic enrichment ensures the linking of entities to LOD, thus making it available on the LOD cloud

We have changed the title of Section 4.3 from “Interoperability” to “ Exchange of Records” to avoid misunderstanding.
--------

Comment:
But the emphasis on formal, set-theoretic notation may be unappealing to this target readership.

Response:
We have kept the formal model but it has been greatly simplified. Instead of presenting the notation for each step of our transformation process, we have now a generic property-based model (Section 4.2). Thus, the table of notation is no longer needed. The small number of notations we use in the formalization is now easier to remember throughout the paper. As a consequence of this rewriting, the subsequent formulas have also been simplified (Sections 5, 6 and 7).
--------

Comment:
Also, it's not clear that a technique that seems heavily based on straight sting "string equivalence" matching will advance the present state of the art.

Response:
We only use “string equivalence” during the correction process (Section 6.2), namely inter and intra collection search. Indeed, this is not a big issue since authority files are standard in memory institutions, which means a given author will always be identically spelled.
For the semantic enrichment, we use techniques that rely on “string similarity” rather than “string equivalence” (Section 6.1.2):

We create a set of different queries for a given property value to maximize the probability to discover the correct LOD entity in the result sets.
These queries are then sent to services that allow flexible queries, such as support for wild-card strings in the query.

--------

Comment:
There are a number of places where terminology and concepts that will only be accessible to specialists could usefully be explained further - or rather, where the position of the explanation could be brought forward to earlier in the paper. In several places practical examples would help to illustrate meaning. The MARC format is introduced in section 2.2 and a sample record would help readers unfamiliar with its layout; perhaps just a reference forwards to Figure 3.

Response:
We have added an example of a MARC record in Section 2.2 and used this example to explain the concepts related to MARC record. Furthermore, we improved the explanation of the use case in section 4.1. We kept the Figure 3 (now Figure 4) at its place since it deals with a specific use case.
--------

Comment:
The term "added entries" is used several times from section 2.3 onwards, but not explained until section 4 (page 9).

Response:
We have added the definition in Section 2.1 (line 9) where it is first used. An example with detailed explanation is kept in Section “Overview of the framework”.
--------

Comment:
Section 3.1 is particularly involved: a dense comparison of MARC with FRBR that would be much easier to follow through a worked example.

Response:
Each project is presented in a separate paragraph and the name of the project is bolded. Having added an example for each of the projects would have taken too much space. We have simplified the Related Work section and present the related projects. Since we discuss an example in the Background, this section should now be easier to understand.
--------

Comment:
The abbreviation "LOD" (not spelled out) is introduced ahead of its explanatory gloss further on in the text (section 6.2).

Response:
We have moved the abbreviation of LOD to Section 2.3 where it is first introduced.
--------

Comment:
The intricate descriptions in sections 4, 5 and 6 become much clearer when one reaches section 8, so some re-ordering would help.

Response:
Sections 4, 5, 6 are the core contributions of the paper and thus we have not reordered them. However, we simplified the formalization and the text so it is easier to follow both for library and semantic web communities.
--------

Comment:
Section 2 gives useful background about bibliographic formats, though I'd have expected some reference to W3C initiatives and the OMG (Open Metadata Group) work on ISBD XML schemas, and IFLA work on FRBR definition using SKOS. (I'm not a specialist in the bibliographic field so my grasp of which developments are significant may be flawed.)

Response:
We are aware that there are many ongoing activities to create schemas and type vocabularies for library metadata, The IFLA work on FRBR using SKOS is already mentioned since this is the vocabulary we are referring to as under development by IFLA. From the other comments we have identified a requirement to reduce the introductory part and thus we find it naturally not to include too much about this.
--------

Comment:
The diagram in section 2 (Fig 1) is very helpful, though I wondered why the arrows were double-headed rather than directed, eg "is created by" is not a symmetric relation (nor of course is its inverse).

Response:
We fixed this Figure by adding the corresponding relations as defined by the FRBR model.
--------

Comment:
The entity terms introduced here - Work, Expression, Manifestation etc - should be italicised or given a special font throughout the text to avoid confusion.

Response:
We have italicised these terms throughout the paper.
--------

Comment:
I felt the information given in sections 2 and 3 could be abbreviated (likewise section 5), especially if the paper is intended to appeal beyond the library world.

Response:
We have reduced the sections “Background” and “Related Work”, but Section “FRBR-ML:Representation” is one of the core contributions and we preferred to keep it in that level of detail, especially for the library community.
--------

Comment:
It was a bit dispiriting to reach page 8 before getting to a heading (section 4) of "Preliminaries". It's a small point, but one or two sections could do with more expressive headings, to help navigation; I found myself referring back and forth a lot, and smiling wryly to find what I needed at one point in the rather complicated section entitled "Simplicity and Understandability".

Response:
We changed the title of the section from “Preliminaries” to “Overview of Our Framework” and the latter reflects the content in that section. Due to reductions of text, this section now appears on page 6. For example, we have removed the text formally describing the set of FRBR entities. Consequently, the title has been changed from “Simplicity and Understandability” to “Entity-oriented Representation” to better fit the content. As a result of simplification of formalization, we believe the text is easier to understand and there is no need to go back and forth.
--------

Comment:
Personally I didn't find the formal notation introduced in section 4 helpful - it slowed me down rather than the opposite. At best it just restates what can be expressed more simply in natural language, and in some places it adds confusion. For example, I was unclear whether Cdiamond and Ddiamond (I only have ASCII characters available here) are genuinely subsets of C and D as stated on page 10, or whether the sub/super-set arrangement can be either way round as stated earlier. In each case (MARC, FRBRizer, FRBR-ML) the relationships between R, C, D, S etc are identical, so it didn't seem to me that useful information was being summarised, which is the value of formal notation.

Response:
The formalization (Section 4.2) has been simplified and there are no more different notations for each step.
--------

Comment:
The interesting mappings, that are the core of the paper, are *between* these separate formats, ie r to rdiamond, C to Cdiamond etc, but these mappings are not given. The "map_d()" function (as opposed to "map_s()") is defined as taking only a MARC datafield tag label (eg "100" or "240") as argument, and then shown taking a tag-value pair (page 12). It is shown as evaluating to "mu", which is not defined in Table 1.

Response:
There is only one map function now that takes a property as an argument. “mu” is defined as a semantic element (Section 5.1). When it comes to mappings between a MARC record and the FRBRizer output, they are defined using a set of specific rules. We refer to the paragraph describing the “FRBRizer approach” in Section 3 for more information and to the reference [2].
--------

Comment:
To take just one example amongst many that seemed obfuscatory rather than helpful, the statement "In FRBR-ML, a FRBR entity f* in F* is related to another one with a relationship l* in L* such that l*: f* x f*." could be expressed as "FRBR entities can be linked by relationships".

Response:
These changes have been applied to the paper.
--------

Comment:
The link at footnote 8 on page 12 - to the FRBR-ML schema - seems to be broken.

Response:
The broken URL has been fixed.
--------

Comment:
The distinction between the "hierarchical" and "reference" methods in section 5 does not seem important enough to justify the effort around it. The algorithm for deciding which format to use is nicely set out but is this step really necessary? Why not simply use the "referencing" method throughout and drop the complication of "hybrid representation"?

Response:
The users who manage bibliographic records in MARC format favor readability and in many cases hierarchical approach would be beneficiary for this purpose as it is much closer to the traditional notion of metadata records. Although referencing is the ideal representation seen from the processing point of view, it is not necessarily a view on the information that is easy for humans to deal with when creating and inspecting the information. The output of our conversion (and the kind of transformations that others do as well) is a set of entity-records with references between the relationships. Since there will be many persons, works and expressions that are related to more than one manifestation, and each manifestation-rooted tree potentially can be very deep because of work-to-work, work-to-person relationships etc. we need an algorithm that can be used to find the best way of distributing these entities as “values” and when to limit the depth of trees by using references as leaf nodes. The problem we solve is rather pragmatic of nature and related to the representation/view on reference-based information as manageable units of records that can be represented as hierarchies.
We have added this discussion in the Section 5.2.1.
--------

Comment:
The use case describes transforming an unspecified relation between "Hans-Joachim Maass" and a bibliographic record into a more specific relation: an "is realised by" connection from the Expression entity. In fact Hans-Joachim is the translator, but this relation cannot be explicitly expressed in the original MARC, nor in FRBR-ML. However FRBR-ML is able to insert the missing relation with Expression, and to specify which translator goes with which instance of Expression.

Response:
We agree that in some cases the information about the correct relationship might be “undiscoverable” which we aim at tackling as part of our future work. However, if the relationship is discovered using the techniques discussed in Section 6.2 (intra / inter-collection search), then we are able to insert the correct relationship (translator using the relator code “4 trl”). The use-case in Section 8.4 illustrates this example. However, the reason for naming the relationship “is realised by” between translator and the expression is because there is no subtyping of relationships in the FRBR model.
--------

Comment:
The "completeness" measure calculates how different the input and output fields are - so one could trivially get a perfect score by a null transformation of the input.

Response:
It is challenging to obtain a perfect completeness score, because during interpretation using FRBR model, datafields will be stored under different entities. Additionally, completeness is affected because we sometimes change the datafield tags during the correction process. As an example, 700 fields were used as added entries for both persons and titles. Yet, the MARC standard specifies that an added entry for titles should be in field 740.

We have split this metric into quantitative and qualitative completeness. Quantitative completeness computes the total number of properties (same as completeness defined in the first version of the paper). Based on the reviews, we have decided to add qualitative completeness which measures how many entities we have been able to interpret. This is important since we need to ensure that the entities found in the original records are correctly transformed back.
--------

Comment:
Section 7.1 uses MARC control fields as an example but it's not clear why these would be changed, and the paper does not describe control field mapping.

Response:
Since some control fields are only pertinent to specific FRBR entities, there is a risk of losing these control fields during the transformation back to MARC. e.g.: control field 008 includes language information which is an attribute of FRBR expression. In the formula, we now use properties instead of control and datafields.
--------

Comment:
Once again the formal notation fails to convey additional information; for example, the comp_f() function defined on page 18 shows that input and output lines of a MARC record (tags, subcodes and values) are going to be compared, but doesn't tell us how - I assume a string comparison, perhaps implemented through the hashing mentioned earlier in the paper.

Response:
We use string comparison with hashing described in Section 5.1. We made it clear in Section 7 too.
--------

Comment:
The "redundancy" measure considers the amount of duplication present (exact string equivalence again) - but surely from a practical point of view such duplication should simply be eliminated. Why bother with an intermediate step of measuring it?

Response:
Indeed, this duplication can be eliminated by a post-processing step. The effectiveness of this post-processing step can be a subject of another discussion as it is a significant effort. We think it is important to measure it since the decision of the hybrid method and the merging process may still lead to duplication with no possibility to track the source record.
--------

Comment:
The third metric is "extension" and, as specified in the expressions in section 7.3, it is simply a measure of *how much* has been added - so one could trivially get an arbitrarily high score by adding characters in the output. Surely the only important evaluation measure is on the *semantic* content - whether what has been added is correct.

Response:
Yes, the extension metric defined in the first version of the paper is quantitative. Thus we have renamed the metric to “quant_extension”. However, to address the quality of the extension, we explain in Section 7.3 why it is a difficult issue. Since a human judgment is needed to assess the quality of the extension, we refer to the experiment section (8.3.2) in which we evaluate a subset of the collection. For the evaluation of correction which also requires a human validation, we presented the solution to a complex use case (Section 8.4).
--------

Comment:
It's only on page 20 that we reach commentary on whether the enrichment process has produced better bibliographic records or not. Since this evaluation, by 8 human judges, seems the core result I would have welcomed more detail on the process.

Response:
More details on this are now given in Section 8.3.2. Specifically, we detail how the ground truth is established by human experts who had to manually search the correct entity in the LOD cloud. For this experiment, we have used a subset of the collection that mainly include works with English translations (popular works) and creators of those works. Therefore, it was easier to discover them on the LOD cloud.
--------

Comment:
Measuring precision is feasible using experts (ie is this data correct?) but measuring recall (is any information missing?) is notoriously difficult in open-ended tasks of this nature, and the precise method used would be of interest. The remark "...presented the top three candidate matches...including a manual search on the knowledge bases for the entry value when needed..." is particularly intriguing.

Response:
To compute the recall we need the total amount of FRBR entities that have a correct LOD entity. This value has been computed thanks to human judges who have manually searched the LOD using different techniques (Dbpedia Lookup, Freebase Search API). We have developed a GUI to help experts with this task (“FRBRPedia: a Tool for FRBRizing Web products and Linking FRBR Entities to Dbpedia”, a demo to appear in JCDL 2011). Based on this ground truth and the amount of correct LOD entities we are able to compute the recall. We also assume that if the (popular) work is not found by our experts, then it does not exist on the LOD cloud.
--------

Comment:
I would have preferred numbers in Figure 9 instead of a bar chart, but the scores seem remarkably high for what is a difficult knowledge enrichment problem.

Response:
We have added numbers in the text when describing the plot. The results are high because we have taken 800 randomly selected works with English expressions from the NORBOK collection. This set of works contains works that have several translations including English making them quite popular works and with higher probability of discovering them on the LOD cloud. However, the more data is available on the LOD cloud, the more works from our collection will be discoverable on the LOD cloud.
--------

Comment:
I would encourage the authors to restructure the paper around these results and drop some of the methodological details (that I've spent far too long on myself, in this review).

Response:
We have reorganized the experiment section to reflect both quantitative and qualitative results of the evaluation.
--------

Comment:
I noted a number of minor typographical and similar errors, and can supply a list if required.

Response:
The reviewer has been contacted and the errors that she sent us have been corrected in the paper.
--------

Reviewer: Ray Larson

Comment:
First the grammatical issues - In section 2.1 "that constitutes a …" should be "that constitute a …" later in the same section "use of punctuations and separators characters" should be "use of punctuation and separator characters" and "that are users need to search" should be something like "that are needed by users for searching"

Response:
These typos have been corrected.
--------

Comment:
My major issue with the paper is more of a question about whether the "hierarchical Method" described sort of turns the whole notion of FRBR upside down. One of the characteristics of FRBR that was most revolutionary was the adoption of the Work instead of the manifestation as the primary unit of organization. But in the hierarchical method you have made the manifestation the top-level element with the work nested within. This is naturally a better fit to the philosophy of the source MARC data, but I would argue that it sacrifices the benefits of the FRBR stucture (such as grouping all expressions and manifestations of a work under that work, instead of vice-versa). Although the paper points out that there are drawbacks to this method (and to the referencing method as well) I think that could be made a little more forcefully.

Response:
Having manifestation as the root element in the hierarchical method indeed turns the notion of FRBR for adopting Work as primary unit of organization upside down. However, we argue that if a work has many expressions and manifestations (e.g., Tolkien's Lord of the Rings has 1071 editions), representing these entities with the work as root would lead to a very large tree. As one of our requirements is to have an exchange format, this would be near to impossible to achieve because libraries are used to managing records, i.e. a self-contained piece of information describing a specific publication. On the other hand, having manifestation as a root element also means that we have to parse manifestation-> expression to obtain a reference to work which is a drawback of the hierarchical method.
--------

Main menu

Editorial Board

Syndicate

FRBR-ML: A FRBR-based Framework for Semantic Interoperability

Comments

RE: review comments

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

FRBR-ML: A FRBR-based Framework for Semantic Interoperability

Comments

RE: review comments

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles