Supporting Multilingual Bibliographic Resource Discovery with Functional Requirements for Bibliographic Records
Review 1 by anonymous reviewer
The authors have done a great job of revising the article for publication. It is vastly improved in terms of readability (both the rearrangement of section and especially the proofreading job), and the addition and clarification of information (e.g., at the end of section 3). There are a few more errors and clarifications that require attention, which I will communicate to the editors so they can address them in the editorial process.
This is a revised resubmission after an accept with major revisions. The reviews of the original submissions are below.
Review 1 by anonymous reviewer
(Second) Review of paper 'Supporting Multilingual Bibliographic Resource Discovery with Functional Requirements for Bibliographic Records'
The paper, which is an interesting one, especially for the library sector, has been revised, following the reviewers' comments. Some issues have been improved, as for example the inclusion of the user answers to the 'open questions ' given in Appendix A. Nevertheless, two issues still remain, according to my evaluation.
The first is that the paper presentation has not been changed (I had mentioned this in the original review) as far as its first part (sections 2-4) is concerned. Too much historical and general 'existing' model description is given, while details of what has been implemented are missing. It would not be expected, for the type of journal that the paper aims at : a) to provide a general description in section 3 and only in its end to have a single paragraph for referring to the existing FRBR ontologies, b) in the end, to describe what has been done by the authors into a single sentence 'we extended the ontology… to include all attributes and relationships… we used only a subset of them...since they were not available in the source data'; what do 'all' and 'subset' refer to, and is 'data' dependency known a-priori? Giving some Table with 'a sample 'of the extensions and of what was implemented would be necessary here from a technical point of view.
The second has to do with the evaluation. Referring to the comments given in the Appendix to the 'open questions', one clearly sees the problems that really exist in the usage of systems by the users – this is shown especially in the replies to the last two questions. It is there that the authors should focus the evaluation, commenting on the issues that are still needed to target for successfully handling users' searches and satisfying them.
Review 2 by anonymous reviewer
The issues of content brought up by the reviewers have largely been resolved and improvements have been made in writing and clarity.
The article is still a rough-enough read in some places (e.g., places in the abstract, introduction, second-to-last paragraph of section 2), however, that I would recommend a quick round of editorial feedback from a native speaker, preferably with a background in cataloging.
A few small comments on the "evaluation of results" section that I hope will be helpful:
* The clarifications added and new appendix really help!
* Mention appendix A the first time the open-ended questions are referenced in para 3.
* Who the volunteers/respondents were still was not entirely clear in the first and third paras (Were library users included? Do "library professionals" include both staff and librarians?). I got the answer to the first question in the following section but think it would be useful to make this information explicit in the first para of section 6.
* The second-to-last para makes it seem as if the open-ended responses were only useful for the negative feedback that was reported, but my guess is that these responses were also used to amplify the positive conclusions? If that is the case, I would make that clear in introducing this para.
This is a revised resubmission after an accept with major revisions. The reviews of the original submission are below.
Review 1 by Kate Byrne
This is a clear and concise paper describing an innovative use of FRBR to improve the presentation of search results over a bibliographic catalogue, constructed as part of the TEL project (The European Library).
The software is designed to be integrated with a standard library OPAC (for online queries of the library catalogue by users). In the evaluation described in the paper, by 31 library staff volunteers, each user was presented with alternative interfaces: the standard OPAC results, and a version where the same results had been post-processed using the data structures available in FRBR. The FRBR-based version allows the results to be clustered by "manifestations" of a given "work" (two of the inter-related FRBR entities), then expanded with information linked to those entities (that wouldn't have been returned by the standard OPAC query) and finally re-ranked using an algorithm based on the original OPAC ranking but incorporating extra information. The reported results show a clear preference for the "semantically enriched" system amongst users, and the screen shots in the paper present it as well-structured and attractive.
The paper starts with a clear exposition of the issue being dealt with. The subject matter chosen was works by Literature Nobel Laureates, because the high volume of Manifestations of such Works show the system to good effect. It seems no bad thing that the system performs best over material that is likely to be amongst the most popularly searched for.
A succinct history of the development of FRBR is given, with helpful web links to further information and an informed commentary on the way user attitudes to catalogue searching have evolved with universal adoption of web search methods. A very clear diagram (Fig 1) explains the FRBR model, as a collection of entity classes connected by directed relations. The word "ontology" is used differently by different authors, and the usage here to refer to "a concrete specification of the model" seems a little unusual; but the meaning is clear enough.
An attractive point about the system is that it is designed to be "bolted on" to an existing library management system. A possible drawback is that the entire catalogue has to be pre-processed offline, to convert the MARC records to FRBR and build "cluster indexes". This is clearly a maintenance overhead, as the catalogue is presumably subject to frequent updates. However, if the attractions of the system are sufficient, one could see it as a way of drawing libraries in to adopting FRBR, in a practical and relatively simple way.
The evaluation process involved users answering a questionnaire after performing the same tasks using the standard OPAC and the enhanced system. It is not stated whether the tasks were pre-defined, nor what they were if so. Only 5 questions were asked about the usefulness of the semantic clustering of results, and apparently "yes", "no" or "sometimes" were the only acceptable answers. With a fairly small group of judges like this, it seems a pity that user comments were not also reported, as these will often provide considerable insight even though they are obviously not readily tabulated as formal results. The questions are quite restrictively worded, and one feels there may have been other factors involved. For example, when users are asked if the clustering was helpful for discovering additional resources (question 2) one wonders if the preponderance of "sometimes" replies is because the system does not provide enough additional resources or because it provides too many. Similarly, the question that had the most negative responses (question 4) asks whether clustering is helpful for finding the first publication of a resource - but it's not clear whether this is actually an objective for the user, or whether perhaps the answers reflect the fact that a standard OPAC will probably allow sorting of results by publication date anyway.
The description of the "FRBRizing" process (section 6) is less assured than the rest of the paper. One is left with the impression that there are a number of knotty problems here, without being clear what they are. However, there are references to other publications that possibly explain more fully.
This paper makes a useful contribution, describing what seems valuable progress in semantic enhancement of bibliographic searching. A couple of questions come to mind that are not considered in the paper. Most of the information used in the clustering and expanding processes must be available in the original MARC records (which are the source of the FRBR data), so why is the same presentation of results not possible over the unaltered catalogue records? Standard library OPACs do not present results in the way shown here, but could they not? Possibly the reason why not is because of the processing complexity, in working out the equivalent of FRBR "work" and "manifestation" entities at run-time, in order to group records together and find related items. My other query is almost the converse: could the user's query be run directly against the FRBR cluster indexes, instead of starting with the query results produced by the OPAC? If FRBR is destined for widespread adoption then presumably its data should be queried directly rather than through intermediaries. Again, I'm speculating that it may be a performance issue, as library OPACs are optimised to produce specific kinds of results very fast from often enormous catalogues; it may be that a two-stage process (fast filtering to get a result set, then clever enhancement work) is inevitable. Discussion of these issues by the authors would be much more informed than my guesses.
I noticed a number of minor errors or infelicities of style which I list here as there weren't too many:
p2, 3rd paragraph: "has been de difficulty" should be "...the difficulty".
p4, 2nd para: "Their work consisted on designing" should be "...of designing".
p4, 3rd para: "this kind of features" should be "...of feature".
p6, top of 2nd column: I suggest "logarithmically" instead of "in a logarithm way".
p6, third last para: I suggest "be transformed" instead of "be subject of transformation". Also, substitute the section number for the name, "see Section 6".
p6, second last para: "are shown in Figure 2" should be "is shown...".
p7, 1st para: "worth to notice" should be "worth noting"; "manifestations associated to each work" should be "...with each work".
p7, top of 2nd col: "authorities with more endeavors associated with it" should be "an authority..."; substitute "in preference to" for "in detriment of".
p7, next para: omit comma from "Figure 5, shows".
p11, beginning of section 6: "resulted from" should be "resulting from".
p12, first line: I suggest "deal with this" instead of "face this".
p12, 3rd para: "problem is even higher" should be "...even greater".
p13, 3rd para: the first sentence is incomplete: "The FRBRization of aggregated works and serial works was also not completely." ("not completed"? "incomplete"?).
p13, penultimate para: I suggest "not given detailed attention" instead of "not subject of a detailed...".
Review 2 by anonymous reviewer
This paper presents an architecture that uses the FRBR model in order to improve the discovery of bibliographic resources from Online Public Access Catalogues. It is based on searching by propagation of a single query, on ranking results, and refers to relevance feedback (although this is not specifially treated in the paper), which are all inspired from web search engines.
A large part of the paper (sections 2 and 3) is of introductory to the field nature, providing historical and general FRBR descriptions. On the contrary, some issues, such as that the "FRBR in RDF" ontology was extended to include the attributes and relations defined in FRBR as "class properties", are not described in the required detail.
In the next section, which describes the architecture, a service called "semantic cluster" is used, including FRBRization, clustering, expansion and reordering. Once again this is given in a descrptive, not detailed, manner. This can be due to the fact that the work has already been presented in former publications of the authors (e.g., ). Issues, such as: what is meant by indirect relation and which manifestations are clustered together, or how is the reduction of the weight of manifestations interpreted in practice, should be clarified.
Provision of some example could assist in illustrating the methodologies.
Moreover, section 6 which presents previous work of the authors for the FRBrization process of MARC entities, should be reduced and be moved before the experimental section, for readability reasons.
In summary, the paper is interesting, referring to the work done in the framework of one of the Europeana related projects for the library sector. However, great part of it either presents models that are already well desribed in the related literature, or duplicates former publication of the authors. The paper should, thus, be rewritten reducing such information and focusing on the extensions that give further insight in the followed methodologies.
Review 3 by Ray Larson
This is a well-written and interesting paper on using the FRBR principles to structure the results display of an OPAC, and also about the application of ranking to OPAC results. Overall, the paper is quite readable and
makes a good argument for the use of FRBR principles. My only criticisms are focused on some limitations of the literature review, and a few
minor typos or phrasing issues.
The main concern is that the literature review appears to be largely restricted to work from the late 90's and 2000's and ignores much of the the earlier work on OPACs that also dealt with some similar issues. For example, the RLG - RLIN system in the 1980's used a very similar display of
grouped records for the same works (even called their "clustered display"),
and similar displays were adopted by some early OPACS. In addition, work on ranked retrieval in OPACs dates back to the late 80's and early 90's, long before the Web, including such systems as CITE at the US National Library of Medicine, the OKAPI system at the Polytechnic of Central London, The HyperCatalog project in Sweden, and the Cheshire system at Berkeley.
As to phrasing issues: "… bibliographic data being the OCLC FRBR Work-Set Algorithm the most important reference." probably should be
"the OCLC FRBR Work-Set Algorithm being the most important reference."
Review 4 by anonymous reviewer
Overall, I found the background information and research presented in this article to be interesting and significant given the wealth of information libraries have in traditional catalogs that could be presented in a semantically richer way. The conclusion was to the point, and the section on future directions was a nice addition. Below are some suggestions that could be used to improve the piece.
Briefly addressing what new generation ILSs do and do not do, and why a library would prefer to FRBRize their own bibliographic data would be helpful to explaining the paper's significance.
The literature review gave good background information and a scan of the literature. A little more focus on techniques to FRBRize legacy data and other studies that evaluate the utility of new services built on it, if they exist, would be beneficial. Reducing the number of quotes and referring to authors would be a great improvement.
The methodology is addressed for the FRBRization of the data, but not for the user survey. Some sentences about the participant population, how they were recruited, and why their feedback is relevant (do they represent library users, or library staff?) would be desirable. From the abstract and introduction, I expected a comparison of the traditional OPAC and the new system (semantic cluster) being tested, but the survey questions included do not reflect this. Also, a stronger tie could be made between the key features of web search engines and new features introduced by the semantic cluster approach, e.g. the feedback function is mentioned a few times and does not seem to make it into the discussion of the new system.
Data analysis: This section would really benefit from more detail, such as discussion of the response scale, and more discussion of the survey results. I wondered particularly how the authors knew what the negative feedback they received on the semantic cluster was related to? Including the entire survey as an appendix would help. The figures and tables were useful throughout, but placement closer to the relevant narrative in some cases would be an improvement. The survey questions in the text could be combined with table 1, and the percentages should probably be eliminated since they are based on such small numbers.
Writing style and clarity: The writing overall is good, though there are awkward areas that could be improved with editorial feedback from a native English speaker (e.g., issues with word order and choice, especially prepositions, and sentence fragments). Care should be taken that quotations flow grammatically (see p. 3 ). Section 6 could be moved after section 4.