An information model for managing resources and their metadata
Solicited review by Sven Schade:
While the last revision could resolve all critiques in terms of content, this second round of modification largely succeeds in presenting the valuable findings. The authors again took a major effort in addressing all reviewers' comments. For me, only three minor revisions prevent this article form being published:
1) The first sentence of the abstract is not precise enough ('...where the same problems tend to occur during the development phase'). In fact, the first three sentences describe the issue at hand in a too complicated fashion. They should be revised in a sense that clarifies (1) the context - information management in and between web applications, (ii) the problem - development of applications, which are capable of exchanging data and meta data, and (iii) the desired solution - a reusable and sufï¬ciently generic information model.
2) The heading of section 1.1 might be simplified, e.g. to 'Open Issues' or 'Problem Statement'.
3) The last paragraph on the left column of the second page ('Parts of the presented solution in this article rely on the use of RDF and the concept of named graphs...') breaks the flow of the text, because it mentions the proposed solution too early. I would keep the statement on a more generic level, e.g. 'The concept of named graphs, and particularly the use of the Resource Description Framework (RDF) have already been suggested as a partial solution'.
Once the comments above have been addressed, I would recommend to finally accept this article for publication.
Solicited review by Tudor Groza:
The authors have addressed all my comments.
Revised manuscript after "accept with minor revisions". Reviews of the initial submission are below.
Solicited review by Sven Schade:
The initial comments of the three reviewers were sufficiently addressed in most respects. As a direct result, the newly submitted version provides the required improvements to consider this article for publication. In fact, I highly appreciate the authors"™ hard work on the topic in general and also in improving the presentation in a scientific article. The results provide valuable results for the Semantic Web community and I hope to see this work being continued and also taken up by others.
However, although I appreciate the intense revision as such, I see several needs for improvement, mainly related to the presentation style. It has already been noted on the previous manuscript that the authors tend to write in a reporting style. Unfortunately, this as in parts still carried over into the revision. More detailed comments for improving this aspect and several minor items are given below.
Major comments:
- The abstract already begins in a reporting style and as a whole is way too detailed. Many of the statements would fit better into the introduction, such that that part of the text becomes self-contained. I suggest to completely re-writing the abstract in order to present a brief but complete overview of the complete contribution, including conclusions and future work.
- As already indicated above, the introduction should be self-contained, i.e. especially contain the brief context, motivation and problem statement, before indicating the intended solution and the structure of the remaining article. Parts of the current abstract should be used to further elaborate on these aspects in section 1. Special care should be taken in respect to the use of English language, for example the second and third sentence in the new version show an eminent decline compared to the initial manuscript.
- Section 1.1 should be moved to the background section in order to increase readability of the overall article. The style might be changes to a more narrative description, as the current text reads almost like a glossary.
- Section 1.2 should become more prominent. It describes the real problems as requested above. Accordingly, this might be moved up and converted to bullet points.
- The last paragraph and listing of the current section 1.2 reflect partially the previously requested indication of the overall approach. This should be better integrated into the overall new revised section 1.
- Section 3 should begin with a few "˜bridging"™ sentences, which take the reader from the given background to the following introduction of ReM3.
Minor comments:
- The term "˜publication"™ might in general be replaced with "˜article"™ or "˜paper"™.
- Section 1, just after reference 7, "˜for example"™ should be deleted.
- Section 1 should complement the goals of Organic.Edunet with those of Ariadne.
- Section 1, text about article organization should be revised.
- The first mentioning of "˜Semantic Web"™ should be equipped with a reference.
- The above also holds for Dublin Core.
- The formatting is "˜broken"™ in the paragraphs just below Figure 5.
- Section 6.3 again begins in a report kind of style. "˜The authors participated in"¦"™ should be changes to something like "˜Within the scope of the "Hack4europe!""¦ we "¦"™.
- Section 9 should be called "˜Future Work and Next Steps"™.
Solicited review by Tudor Groza:
The authors have definitely improved the manuscript and have addressed well all the weak points mentioned in the previous review. While from a technical perspective the manuscript is on a good track, unfortunately, the presentation requires some additional work (as per the comments below). The major issues with respect to it are: a fairly incoherent introduction, the lack of examples in the model description and the inclusion of a series of background descriptions, which could have been easily left out as they represent common knowledge for the readers of this journal. Finally, a couple of aspects could be improved in the presentation and discussion of the evaluation.
Comments:
* The introduction has been re-written, however, it is now more confusing than previously. It starts directly by discussing the contribution of the paper without any sort of background, context or motivation. The fact that it has been split into corresponding subsections that deal with some of these aspects (e.g., the motivation and goals of the framework) is perfectly fine, however, these need to follow a coherent argumentation thread, which is, in this case, absent. The very first sentence of the introduction raises confusion: "The focus of this paper […] to solve these kind of problems." - what kind of problems? It's true that the abstract describes these problems, nevertheless, the introduction should be coherent on its own, without reading the abstract.
* Other aspects that could be improved in the introduction are: (i) a justification of why is this approach better than other existing approaches; (ii) moving the terminology subsection at the end of the introduction; and (iii) moving the description of the structure of the paper after the terminology, at the very end of the section.
* The state of the art section still reads more like a background, and some parts could be left out, e.g., not sure if 2.4 and 2.5 are really needed. Also, access control and RDF is now mentioned, but just as 'note' and without analysing to what extent the current approach complements this state of the art. Finally, access control is also discussed (later in the paper) in the context of SPARQL. Hence, an overview and discussion on this aspect should have been included in the related work, especially since it's a fairly hot topic.
* The inclusion of the RDF-based formalisation is a great addition to the paper, however, the entire section 3 could be probably improved (w.r.t. readability) by adding some concrete examples, especially in 3.1.
* In 4.3: the authors have clearly adopted the easiest solution w.r.t, access control and SPARQL. However, this does raise the question of whether this solution is enough, and what happens if one does have the necessary credentials to access certain data, but it cannot, because this scenario is not supported.
* The authors have done a good job with the evaluation, especially w.r.t. the structured interviews and the discussion about the framework in the light of these interviews. The survey presentation (Sec. 7.2.1) could be improved by adding a bit of structure to it and including some concrete questions asked as part of the survey (if not the entire list - in an appendix). The scalability evaluation, on the other hand, raises a series of questions. For example, in a real-world deployment (which has happened according to the paper) are those 20 concurrent connections enough? Testing the system using synthetic 'data' is great as it allows one to push the boundaries of the system's scalability. However, to complement this, it would also be interesting to analyse the real usage data. Finally, how is this scalability affected by the backend repository? The authors do mention that the evaluation has been performed only with one backend, however, it could have been interesting to already delimit the latency induced by the actual system in the context of the entire response time.
Revised manuscript (as general full paper submission) after a "reject and resubmit". Previously submitted under the title "An Information Model for the Annotation of Resources with Heterogeneous Metadata" in response to http://www.semantic-web-journal.net/blog/special-issue-linked-data-scien...
Solicited review by Sven Schade:
The paper submission to this special issue on Linked Data for Science and Education presents an approach for annotating web resources with different forms of metadata and outlines a few show cases from the education domain. It focuses on the used information model and a related application for resource and metadata management.
In general, the contribution does not match well with the topics of the special issue. "˜Science and Education"™ covers parts of the example application, but is not emphasized in the problem statement, state of the art and conclusions sections. Illustrative examples are introduced late in the paper, which makes the first parts hard to read and understand. However, the presented work has high potential and it would be valuable to include an improved (and more focused) version in the special issue. I therefore suggest a re-submission after major improvements. According detailed comments are listed below.
Major comments:
- The problem statement is not illustrative enough, e.g. why is the replacement of harvesting problems with linked data an issue at al? This whole section would clearly benefit from an example (from the science and education domain). Here, the example should show the issue. Later, the same example should be re-visited showing how the ReM3 approach addresses the central issues/problems. Some of the required information is available late in the paper, in the "˜showcases"™ section.
- A more detailed problem description would improve the overall value of the contribution. This should include a paragraph about the particular relevance in the context of this special issue.
- The role of web service technology for solving the particular problem of the required information model remains unclear.
- Figure 1 is a central element of the paper. In its current form, it is confusing. Cardinalities might help. The difference between Entity Information, Local and External Metadata is unclear. The same holds for the difference between Entity and Resource. It might be useful to reduce the overall number of expert terminology.
- Below Figure 1, it remains unclear why all the types of types (kind of types, i.e. meta-types) are required.
- The following text about named graphs etc is very theoretic and would benefit from an illustrative example.
- The "˜Representation type"™ is introduced too late.
- From "˜ReM3 "" An Information Model"¦"™ onwards, the paper becomes easier to read. Yet, the desire of using Web Technologies should be clarified early in the paper (see also comment above).
- Under "˜additional interfaces"™ it is mentioned that SCAM supports harvesting, while in the problem section suggest to replace harvesting. These two statements seem to contradict and should be clarified.
- The "˜scalability"™ section addresses some fields which are relevant for this special issue. These might be exploited.
- The Confolio application and the related section are certainly interesting, but many of the presented information does not add value to the paper, it rather shifts the focus. This section might be shortened.
- The relation between the section "˜presenting and editing metadata"™ to ReM3 is unclear, especially because the types in the table do not seem to match to any ReM3 element.
- The conclusions should directly refer to the problems mentioned at the beginning. Is should be outlined how each of these problems has been addressed by the work presented in the paper.
Minor comments:
- The paragraph about the structure of the paper should be moved up (from the problem section to the introduction.
- All headings should be followed by a text block. This should for example include "˜State of the Art"™ and "˜ReM3 "" An Information Model"¦"™.
- A definition of the "˜resource"™ concept would be helpful.
- Why are "things" mentioned on page 4 and not resources?
- The statement that Linked Data extends the Semantic Web can be questioned. In fact Linked Data operates "˜below"™ the semantic layer.
- The State of Art should at least include some sentences and pointers to the most common, frequently used, metadata models, including DC, LOM etc., which are mentioned later in the paper.
- SCAM version 4 should be briefly introduced,
- A clear reference to Confolio is missing.
- The sentence starting with "˜The web API of SCAM"™ does not need an extra paragraph. It can directly follow the text above.
- Acronyms are not used consistently throughout the text.
Solicited review by Tudor Groza:
The paper presents an information model for capturing a comprehensive set of provenance metadata for Web resources, with an accent on the integration of heterogeneous metadata exposed by different sources. It also discusses a reference implementation and several application use cases.
Strong points:
* the supporting service has a reference implementation (although the model's reference implementation is not described)
* the showcases presented in the paper, and also published online, are impressive
Weak points:
* the presentation of the paper is quite weak. For example, the introduction lacks a clear description of the context and of the problem that the authors try to solve, although they do depict clearly the problem in Sect. 2. Also, there are several places were the presentation could be improved by providing concrete examples.
* the paper presents the information model only at a high level and leaves the formalisation unspecified. There are no details (or links to external resources) about the actual grounding of the model into, for example, an ontology or vocabulary. Concrete examples of how can the grounded model be used or how do possible SPARQL queries look like, are also missing.
* the related work is slightly out of context and tends to be a background description rather than an actual related work analysis. For example, what other approaches try to model access control using RDF?
* the scalability supported by the model has not been evaluated. It would have been interesting to see at least what is the amount of triples generated by the model to represent certain aspects, such as ACL, and how does it scale with the number of users / groups and resources.
Detailed comments:
* in general, the use of the term "annotation" is highly unclear in the context of this paper. The authors should provide a clear definition of what do they mean by "annotation" at the very beginning of the paper.
* as already mentioned, the introduction is not setting properly the context and the problem addressed in the paper. What are the "learning repositories" or the "traditional repositories" you refer to?
* the intrinsic entailment from the first to the second paragraph (talking about bringing existing metadata into the Linked Data Could) is highly unclear.
* the use of past tense in the first paragraph is confusing, especially due to the lack of concrete examples.
* "Using Semantic Web technologies to annotate resources with metadata […]" -> since the claim is generic (with no grounding in a particular context or domain problem), a reference to some more foundational work in (semantic) annotation would be more appropriate, instead of the two self-citations.
* the short discussion on using triples to describe resources and the missing provenance information requires some examples and / or a proper reference.
* the problem statement does shed some light onto the issues addressed by the paper, yet it could profit, again, from some concrete examples. What kind of diverse information are you referring to? What is educational metadata? Please provide clear examples for these.
* "Metadata is copied …" - this phrase is unclear. What are the metadata instances you are referring to?
* next phrase: The need to provide links between related resources is probably rooted in some requirements, which should be specified. Otherwise this claim has no support.
* the formulation of the shortcomings of Named Graphs discussed at the end of Sect. 2 is out of context. Named Graphs are a generic representation mechanism and the way in which developers make use of them is application / domain dependent. Hence, there is no need for generic guidelines on the provenance or relations between Named Graphs as such. For example, if one models Persons via NGs, s/he would need to specify that Persons are related at a conceptual level (i.e., Person_URI knows Person_URI or Person_URI sameAs Person_URI), and not that the underlying representation as NGs of the Persons are related.
* the related work section should probably be renamed to Background (or Foundational work), because it discusses the building blocks that support the solution provided by the paper and not approaches that try to solve the same problem. The section also contains some statements that are not supported by any evidence or references (e.g., "There are situations where the conceptual model cannot be cleanly mapped […]"). Finally, the authors keep hinting towards the direction of technology-enhanced learning and the associated models, but without putting this information into a proper context or discussing the relation to their problem and solution.
* Sect. 4 presents a good overview of the conceptual aspects of the model but lacks in details about a possible formalisation. For example, would an Entry Information be a class? Would Location Type be a class? What kind of relation would exist between the two? Would any ontology design pattern be useful to implement such a relation? What external / widely adopted vocabularies would you recommend for modelling some of the provenance aspects? How can the ACL elements be formalised? All these things should have been described, to give a better picture not only on the conceptual model but also on the implications of following different implementation routes.
* The scalability discussion in Sect. 5.6 doesn't really make sense without proper experiments and numbers (especially the shallow comparisons between the performances of the different instances - e.g., "hardly noticeable" or "very low").
* In the same section, as part of the discussion about the free-text querying and the use of SOLR, it would be interesting to see how were the ACL aspects implemented.
Solicited review by Paul Groth:
This paper primarily summarizes the design and implementation of an information model, ReM3, designed for the management of information in learning repositories. The paper describes some experience with its implementation in the context of two projects and then describes an application, Confolio, for managing personal and organizational profiles that makes use of ReM3.
The paper reads more as a report of what the authors have done rather than a scientific article. To become a scientific article the authors would need to provide significant added detail and explanation in 4 areas: contextualizing the work, related work, identifying contributions, and evaluation. I now describe, in more detail, the concerns I have in each of these areas.
1) Contextualizing the work
The paper fails to orient the reader within the overall domain and scope of the work. It begins by stating "Several projects with focus on exchanging metadata between learning repositories has the same problem: how would it be possible to bridge the gap between "traditional" repositories and triple stories, taking advantage of the features that Semantic Web has to offer". The paper then goes on to describe general notions around Linked Data and Semantic Web technologies. What projects do the authors refer to? What are "learning repositories"? What do they contain? Why are they useful? Why do they need to exchange metadata? What features do the authors refer to? Without answers to questions such as these it's impossible to orient the work.
In Section 2, the problem statement consists of 5 statements that read just generally as the information integration problem in general: integrating heterogenous information, exposing this information, reduplication. I don't believe the authors are aiming to solve this problem in general, thus, it would be better to have a much more focused problem statement.
2) Related Work
The section on the state of the art essentially describes current practice in developing Semantic Web applications. It does not look at what current information models are for learning repositories and why those are insufficient. It does not discuss metadata exchange standards, it fails to look in any way at the related work in provenance. Indeed, of the 19 references only 2 are to non-generic references about the area that are not self citations. For a journal paper, I would expect much more.
I would suggest the authors look at the following survey as an entry point into the provenance literature:
- Luc Moreau. The foundations for provenance on the web. Foundations and Trends in Web Science, 2(2-3):99-241, November 2010.
For more recent Semantic Web provenance literature consult the recent special issue on Provenance in the Semantic Web in the Journal of Web Semantics (Volume 9, Issue 2, Pages 83-244 (July 2011)).
Some entry points for work on learning repositories and information models are:
- Semantic Technologies for Learning and Teaching in the Web 2.0 era - A survey
Tiropanis, Thanassis and Davis, Hugh and Millard, David and Weal, Mark (2009) Semantic Technologies for Learning and Teaching in the Web 2.0 era - A survey. In: Proceedings of the WebSci'09: Society On-Line
- Permanand Mohan, Christopher Brooks, "Learning Objects on the Semantic Web," Advanced Learning Technologies, IEEE International Conference on, p. 195, Third IEEE International Conference on Advanced Learning Technologies (ICALT'03), 2003
- S. Ternier, and E. Duval, "Interoperability of Repositories: The Simple Query Interface in Ariadne," Int'l J. E-Learning, vol. 5, no. 1, 2006, pp. 161–166.
3) Identifying Contributions
The key contributions of the paper are not identified. It is hard to determine what the particular information model adds to the discussion around how to appropriately model systems. In general, I have the feeling that it looks just like the database layout of an implementation. Indeed, this feeling is buttressed by the discussion of caching metadata on pg. 4. Also, I think the authors mix information model and implementation when they discuss Named Graphs in section 4.2. The discussion of provenance metadata is rather lightweight, essentially, listing properties already found in Dublin Core.
I also wonder how this paper differs from the following paper (not cited) about the Ariadne system discussed as a major implementation of the system.
- Stefaan Ternier, Katrien Verbert, Gonzalo Parra, Bram Vandeputte, Joris Klerkx, Erik Duval, Vicente Ordonez, Xavier Ochoa, "The Ariadne Infrastructure for Managing and Storing Metadata," IEEE Internet Computing, pp. 18-25, July/August, 2009
The authors need to clearly identify the contributions of the paper above the state of the art.
4) Evaluation
The authors provide no evaluation of the information model or its systems. They discuss scalability in section 5.6 but only provide anecdotal experiences with no hard numbers or comparisons. They discuss various projects that used the system but no way to judge whether their information model made a difference. That is there is no feedback at a user experience level or at a developer performance level.
Given these four areas of concern, the paper is currently not in a position to be considered as a journal paper.
Minor Notes:
- I thought the URL design in section 5.2 was interesting maybe some lessons learned could be drawn from there.
- The acronym SCAM has poor connotations in English. I would suggest finding a better one.
Comments
General comments on resubmission
Response to review by Sven Schade
The paper submission to this special issue on Linked Data for Science and Education presents an approach for annotating web resources with different forms of metadata and outlines a few show cases from the education domain. It focuses on the used information model and a related application for resource and metadata management.
In general, the contribution does not match well with the topics of the special issue. "Science and Education" covers parts of the example application, but is not emphasized in the problem statement, state of the art and conclusions sections. Illustrative examples are introduced late in the paper, which makes the first parts hard to read and understand. However, the presented work has high potential and it would be valuable to include an improved (and more focused) version in the special issue. I therefore suggest a re-submission after major improvements. According detailed comments are listed below.
Major comments
The problem statement is not illustrative enough, e.g. why is the replacement of harvesting problems with linked data an issue at al? This whole section would clearly benefit from an example (from the science and education domain). Here, the example should show the issue. Later, the same example should be re-visited showing how the ReM3 approach addresses the central issues/problems. Some of the required information is available late in the paper, in the "showcases" section.
A more detailed problem description would improve the overall value of the contribution. This should include a paragraph about the particular relevance in the context of this special issue.
This has been addressed by making bigger changes to the problem statement, it has been made much more focused.
The article was resubmitted to the journal’s main call, therefore no changes were made to explain the relevance for the special issue to which the paper originally was submitted. However, the clarified problem statement now also includes an improved explanation of the relevance.
The role of web service technology for solving the particular problem of the required information model remains unclear.
The information model is not dependent on web services of any kind. However, in the section on Web technologies it is argued that the information model easily can be exposed using REST-ful web services which in turn can operate on the Web of Data.
Figure 1 is a central element of the paper. In its current form, it is confusing. Cardinalities might help. The difference between Entity Information, Local and External Metadata is unclear. The same holds for the difference between Entity and Resource. It might be useful to reduce the overall number of expert terminology.
This has been clarified. Additional explanations have been added and a new figure was introduced.
Below Figure 1, it remains unclear why all the types of types (kind of types, i.e. meta-types) are required.
This has been clarified in the text before the enumeration.
The following text about named graphs etc is very theoretic and would benefit from an illustrative example.
Added a more verbose description and contextualized it for the case of ReM3.
The "Representation type" is introduced too late.
The representation type is introduced in the second paragraph of the section which introduces the model, right after figure 1 which depicts the ReM3 entry.
From "ReM3 An Information Model" onwards, the paper becomes easier to read. Yet, the desire of using Web Technologies should be clarified early in the paper (see also comment above).
This has been clarified in the re-written introduction.
Under "additional interfaces" it is mentioned that SCAM supports harvesting, while in the problem section suggest to replace harvesting. These two statements seem to contradict and should be clarified.
The goal is to provide an alternative to legacy harvesting protocols and to support a transition from copying metadata between systems through harvesting to using Linked Data. This has been made more clear in the text now.
The "scalability" section addresses some fields which are relevant for this special issue. These might be exploited.
A new section with a preliminary scalability analysis, including numbers and graphs, has been added. The scope of this analysis is explained.
The Confolio application and the related section are certainly interesting, but many of the presented information does not add value to the paper, it rather shifts the focus. This section might be shortened.
The relation between the section "presenting and editing metadata" to ReM3 is unclear, especially because the types in the table do not seem to match to any ReM3 element.
Comment on both 11 and 12: the section about Confolio has been improved by clarifying the relevant parts and removing the irrelevant parts. It should be more comprehensive now.
The conclusions should directly refer to the problems mentioned at the beginning. Is should be outlined how each of these problems has been addressed by the work presented in the paper.
This has been made more clear now. Each of the stated problems is addressed explicitly.
Minor comments
The paragraph about the structure of the paper should be moved up (from the problem section to the introduction.
Done
All headings should be followed by a text block. This should for example include "State of the Art" and "ReM3 - An Information Model".
Done
A definition of the "resource" concept would be helpful.
A definition of “resource” (along with other term definitions which are of relevance for the article) has been added to the introduction.
Why are "things" mentioned on page 4 and not resources?
The term “thing” is used in the section about Linked Data because it was also used in the original definition of the Linked Data rules as stated by Tim Berners-Lee. I added an explanation to this section how things and resources are related.
The statement that Linked Data extends the Semantic Web can be questioned. In fact Linked Data operates "below" the semantic layer.
The definition of Linked Data has been changed and clarified.
The State of Art should at least include some sentences and pointers to the most common, frequently used, metadata models, including DC, LOM etc., which are mentioned later in the paper.
Done
SCAM version 4 should be briefly introduced,
The whole section and its subsections about the reference implementation focus on SCAM (now called EntryStore), this was clarified in the text.
A clear reference to Confolio is missing.
Links have been added to all mentioned software projects.
The sentence starting with "The web API of SCAM" does not need an extra paragraph. It can directly follow the text above.
Done
Response to review by Tudor Groza
The paper presents an information model for capturing a comprehensive set of provenance metadata for Web resources, with an accent on the integration of heterogeneous metadata exposed by different sources. It also discusses a reference implementation and several application use cases.
Strong points
the supporting service has a reference implementation (although the model's reference implementation is not described)
the showcases presented in the paper, and also published online, are impressive
Weak points
the presentation of the paper is quite weak. For example, the introduction lacks a clear description of the context and of the problem that the authors try to solve, although they do depict clearly the problem in Sect. 2. Also, there are several places were the presentation could be improved by providing concrete examples.
The introduction has been rewritten and should provide a clear picture now.
the paper presents the information model only at a high level and leaves the formalisation unspecified. There are no details (or links to external resources) about the actual grounding of the model into, for example, an ontology or vocabulary. Concrete examples of how can the grounded model be used or how do possible SPARQL queries look like, are also missing.
A reference has been added to an RDFS describing the model and a detailed figure about the relationships of all classes etc has been added.
the related work is slightly out of context and tends to be a background description rather than an actual related work analysis. For example, what other approaches try to model access control using RDF?
The section with state of the art has been clarified and some details (such as access control) were added.
the scalability supported by the model has not been evaluated. It would have been interesting to see at least what is the amount of triples generated by the model to represent certain aspects, such as ACL, and how does it scale with the number of users / groups and resources.
The scope and content of the scalability section has been clarified. A scalability analysis with numbers and graphs has been included in a new evaluation section.
Detailed comments
in general, the use of the term "annotation" is highly unclear in the context of this paper. The authors should provide a clear definition of what do they mean by "annotation" at the very beginning of the paper.
A whole section has been added to clarify the definitions of the most important concepts in the context of this paper. The term “annotation” is defined there.
as already mentioned, the introduction is not setting properly the context and the problem addressed in the paper. What are the "learning repositories" or the "traditional repositories" you refer to?
the intrinsic entailment from the first to the second paragraph (talking about bringing existing metadata into the Linked Data Could) is highly unclear.
The introduction has been completely rewritten to properly describe the background of the presented research.
the use of past tense in the first paragraph is confusing, especially due to the lack of concrete examples.
"Using Semantic Web technologies to annotate resources with metadata […]" -> since the claim is generic (with no grounding in a particular context or domain problem), a reference to some more foundational work in (semantic) annotation would be more appropriate, instead of the two self-citations.
the short discussion on using triples to describe resources and the missing provenance information requires some examples and / or a proper reference.
This was addressed by a rewrite of the introduction.
the problem statement does shed some light onto the issues addressed by the paper, yet it could profit, again, from some concrete examples. What kind of diverse information are you referring to? What is educational metadata? Please provide clear examples for these.
"Metadata is copied …" - this phrase is unclear. What are the metadata instances you are referring to?
next phrase: The need to provide links between related resources is probably rooted in some requirements, which should be specified. Otherwise this claim has no support.
Clarified.
the formulation of the shortcomings of Named Graphs discussed at the end of Sect. 2 is out of context. Named Graphs are a generic representation mechanism and the way in which developers make use of them is application / domain dependent. Hence, there is no need for generic guidelines on the provenance or relations between Named Graphs as such. For example, if one models Persons via NGs, s/he would need to specify that Persons are related at a conceptual level (i.e., Person_URI knows Person_URI or Person_URI sameAs Person_URI), and not that the underlying representation as NGs of the Persons are related.
The relation between a resource and their metadata is important as e.g. the httprange-14 discussion shows. NGs are elementary to the ReM3 model, and what is modeled there is the relation between NGs; this is why we consider this as relevant for the discussion.
the related work section should probably be renamed to Background (or Foundational work), because it discusses the building blocks that support the solution provided by the paper and not approaches that try to solve the same problem. The section also contains some statements that are not supported by any evidence or references (e.g., "There are situations where the conceptual model cannot be cleanly mapped […]"). Finally, the authors keep hinting towards the direction of technology-enhanced learning and the associated models, but without putting this information into a proper context or discussing the relation to their problem and solution.
See also reply to “Weak point 3”. TEL was mentioned because most of the projects in which context the model was developed were carried out within the field of Technology Enhanced Learning. The described model can be used in any context, not only TEL.
Sect. 4 presents a good overview of the conceptual aspects of the model but lacks in details about a possible formalisation. For example, would an Entry Information be a class? Would Location Type be a class? What kind of relation would exist between the two? Would any ontology design pattern be useful to implement such a relation? What external / widely adopted vocabularies would you recommend for modelling some of the provenance aspects? How can the ACL elements be formalised? All these things should have been described, to give a better picture not only on the conceptual model but also on the implications of following different implementation routes.
See answer to “weak point 2”. A detailed figure containing all ReM3 classes and other relationsships have been added. A separate RDFS file holds information about the whole model.
The scalability discussion in Sect. 5.6 doesn't really make sense without proper experiments and numbers (especially the shallow comparisons between the performances of the different instances - e.g., "hardly noticeable" or "very low").
See also answer to “weak point 4” above. A scalability analysis has been carried out and included into a new evaluation section.
In the same section, as part of the discussion about the free-text querying and the use of SOLR, it would be interesting to see how were the ACL aspects implemented.
The text on free-text search was moved to a new section and description how ACL is used together with Solr was added.
Response to review by Paul Groth
This paper primarily summarizes the design and implementation of an information model, ReM3, designed for the management of information in learning repositories. The paper describes some experience with its implementation in the context of two projects and then describes an application, Confolio, for managing personal and organizational profiles that makes use of ReM3.
The paper reads more as a report of what the authors have done rather than a scientific article. To become a scientific article the authors would need to provide significant added detail and explanation in 4 areas: contextualizing the work, related work, identifying contributions, and evaluation. I now describe, in more detail, the concerns I have in each of these areas.
Contextualizing the work
The paper fails to orient the reader within the overall domain and scope of the work. It begins by stating "Several projects with focus on exchanging metadata between learning repositories has the same problem: how would it be possible to bridge the gap between "traditional" repositories and triple stories, taking advantage of the features that Semantic Web has to offer". The paper then goes on to describe general notions around Linked Data and Semantic Web technologies. What projects do the authors refer to? What are "learning repositories"? What do they contain? Why are they useful? Why do they need to exchange metadata? What features do the authors refer to? Without answers to questions such as these it's impossible to orient the work.
In Section 2, the problem statement consists of 5 statements that read just generally as the information integration problem in general: integrating heterogenous information, exposing this information, reduplication. I don't believe the authors are aiming to solve this problem in general, thus, it would be better to have a much more focused problem statement.
Larger parts of the article, including the introduction and the problem definition, have been rewritten and should give a better account of the research context now.
Related Work
The section on the state of the art essentially describes current practice in developing Semantic Web applications. It does not look at what current information models are for learning repositories and why those are insufficient. It does not discuss metadata exchange standards, it fails to look in any way at the related work in provenance. Indeed, of the 19 references only 2 are to non-generic references about the area that are not self citations. For a journal paper, I would expect much more.
The section on related work has been extended and now includes references to additional relevant related work.
I would suggest the authors look at the following survey as an entry point into the provenance literature:
Luc Moreau. The foundations for provenance on the web. Foundations and Trends in Web Science, 2(2-3):99-241, November 2010.
For more recent Semantic Web provenance literature consult the recent special issue on Provenance in the Semantic Web in the Journal of Web Semantics (Volume 9, Issue 2, Pages 83-244 (July 2011)).
A clarification of the scope, the context of provenance in ReM3 and additional references have been added to the state of the art and the description of the information model itself.
Some entry points for work on learning repositories and information models are:
Semantic Technologies for Learning and Teaching in the Web 2.0 era
A survey Tiropanis, Thanassis and Davis, Hugh and Millard, David and Weal, Mark (2009) Semantic Technologies for Learning and Teaching in the Web 2.0 era - A survey. In: Proceedings of the WebSci'09: Society On-Line
Permanand Mohan, Christopher Brooks, "Learning Objects on the Semantic Web," Advanced Learning Technologies, IEEE International Conference on, p. 195, Third IEEE International Conference on Advanced Learning Technologies (ICALT'03), 2003
S. Ternier, and E. Duval, "Interoperability of Repositories: The Simple Query Interface in Ariadne," Int'l J. E-Learning, vol. 5, no. 1, 2006, pp. 161–166.
An implementation of a learning repository is one possible application of the described information model. The focus lies on managing resources together with their metadata, independently from the application domain. The scope of the publication has been made more clear in the introduction and the problem definition.
Identifying Contributions
The key contributions of the paper are not identified. It is hard to determine what the particular information model adds to the discussion around how to appropriately model systems. In general, I have the feeling that it looks just like the database layout of an implementation. Indeed, this feeling is buttressed by the discussion of caching metadata on pg. 4. Also, I think the authors mix information model and implementation when they discuss Named Graphs in section 4.2. The discussion of provenance metadata is rather lightweight, essentially, listing properties already found in Dublin Core.
Performance issues are a relevant problem in SW and LD applications, so the discussion of how metadata can be cached among systems is relevant and we think that this should be mentioned.
RDF is a good foundation for the described information model and so are Named Graphs, this is why they occur in both the model itself and the implementation.
Regarding provenance, the goal was not to develop a provenance model which solves all possible situations. Instead the goal was to provide some light-weight provenance for the named graphs that are used within the information model. This has been clarified in the relevant sections and other highly relevant work such as the PROV model have been referenced.
I also wonder how this paper differs from the following paper (not cited) about the Ariadne system discussed as a major implementation of the system.
Stefaan Ternier, Katrien Verbert, Gonzalo Parra, Bram Vandeputte, Joris Klerkx, Erik Duval, Vicente Ordonez, Xavier Ochoa, "The Ariadne Infrastructure for Managing and Storing Metadata," IEEE Internet Computing, pp. 18-25, July/August, 2009
The authors need to clearly identify the contributions of the paper above the state of the art.
The paper by Ternier described the ARIADNE infrastructure, but does not describe an information model. This is a completely different level, however, a reference to the paper above has been added to the section which describes the ARIADNE showcase.
Evaluation
The authors provide no evaluation of the information model or its systems. They discuss scalability in section 5.6 but only provide anecdotal experiences with no hard numbers or comparisons. They discuss various projects that used the system but no way to judge whether their information model made a difference. That is there is no feedback at a user experience level or at a developer performance level.
A whole new evaluation section was added, consisting of three different parts, namely (1) a scalability analysis, (2) the results of a survey/structured interview with experts to analyze the suitability for metadata annotation processes, and (3) the adoption in real-world projects.
Given these four areas of concern, the paper is currently not in a position to be considered as a journal paper.
Minor Notes
I thought the URL design in section 5.2 was interesting maybe some lessons learned could be drawn from there.
The acronym SCAM has poor connotations in English. I would suggest finding a better one.
Because of similar concerns from the authors the name of the implementation has been changed from SCAM to EntryStore. The paper has been updated accordingly.
Response to second review by Sven Schade
Major comments
The abstract already begins in a reporting style and as a whole is way too detailed. Many of the statements would fit better into the introduction, such that that part of the text becomes self-contained. I suggest to completely re-writing the abstract in order to present a brief but complete overview of the complete contribution, including conclusions and future work.
The abstract has been rewritten and has hopefully improved a lot.
As already indicated above, the introduction should be self-contained, i.e. especially contain the brief context, motivation and problem statement, before indicating the intended solution and the structure of the remaining article. Parts of the current abstract should be used to further elaborate on these aspects in section 1. Special care should be taken in respect to the use of English language, for example the second and third sentence in the new version show an eminent decline compared to the initial manuscript.
Section 1.1 should be moved to the background section in order to increase readability of the overall article. The style might be changes to a more narrative description, as the current text reads almost like a glossary.
Section 1.2 should become more prominent. It describes the real problems as requested above. Accordingly, this might be moved up and converted to bullet points.
The last paragraph and listing of the current section 1.2 reflect partially the previously requested indication of the overall approach. This should be better integrated into the overall new revised section 1.
The introduction has been rewritten, is now self-contained and contains details that previously were in the abstract. There should be a better flow now and provide a better entry point to the whole article.
Section 3 should begin with a few “bridging” sentences, which take the reader from the given background to the following introduction of ReM3.
The introduction to section 3 has been slightly changed to provide a better flow.
Minor comments
The term “publication” might in general be replaced with “article” or “paper”.
Done
Section 1, just after reference 7, “for example” should be deleted.
Done
Section 1 should complement the goals of Organic.Edunet with those of Ariadne.
A clarification was added to the introduction.
Section 1, text about article organization should be revised.
Smaller changes to the text have been made.
The first mentioning of “Semantic Web” should be equipped with a reference.
Done
The above also holds for Dublin Core.
Done
The formatting is “broken” in the paragraphs just below Figure 5.
The line breaks should now be at more appropriate places.
Section 6.3 again begins in a report kind of style. “The authors participated in” should be changes to something like “Within the scope of the Hack4europe!” we “”.
The section has been rephrased.
Section 9 should be called “Future Work and Next Steps”.
Done
Response to second review by Tudor Groza
The introduction has been re-written, however, it is now more confusing than previously. It starts directly by discussing the contribution of the paper without any sort of background, context or motivation. The fact that it has been split into corresponding subsections that deal with some of these aspects (e.g., the motivation and goals of the framework) is perfectly fine, however, these need to follow a coherent argumentation thread, which is, in this case, absent. The very first sentence of the introduction raises confusion: "The focus of this paper […] to solve these kind of problems." - what kind of problems? It's true that the abstract describes these problems, nevertheless, the introduction should be coherent on its own, without reading the abstract.
Both the abstract and the introduction have been rewritten and are better structured and easier to understand.
Other aspects that could be improved in the introduction are:
a justification of why is this approach better than other existing approaches;
Why the described approach is better in the mentioned cases is implicitly mentioned through the posed questions (the “problems to be solved”) and explicitly by the discussion in the conclusion section where the posed questions are revisited.
moving the terminology subsection at the end of the introduction; and
Done
moving the description of the structure of the paper after the terminology, at the very end of the section.
Done
The state of the art section still reads more like a background, and some parts could be left out, e.g., not sure if 2.4 and 2.5 are really needed. Also, access control and RDF is now mentioned, but just as 'note' and without analysing to what extent the current approach complements this state of the art. Finally, access control is also discussed (later in the paper) in the context of SPARQL. Hence, an overview and discussion on this aspect should have been included in the related work, especially since it's a fairly hot topic.
The subsections 2.4 and 2.5 are of relevance for the section describing the reference implementation. A backwards reference has been added.
The description of the information model contains an explanation for why WAC was not implemented (yet). To improve clarity, a reference between the corresponding sections has been added.
The inclusion of the RDF-based formalisation is a great addition to the paper, however, the entire section 3 could be probably improved (w.r.t. readability) by adding some concrete examples, especially in 3.1.
Section 3.1 refers to the ReM3 specification in the source code repository to which examples have been added now.
In 4.3: the authors have clearly adopted the easiest solution w.r.t, access control and SPARQL. However, this does raise the question of whether this solution is enough, and what happens if one does have the necessary credentials to access certain data, but it cannot, because this scenario is not supported.
The authors have done a good job with the evaluation, especially w.r.t. the structured interviews and the discussion about the framework in the light of these interviews. The survey presentation (Sec. 7.2.1) could be improved by adding a bit of structure to it and including some concrete questions asked as part of the survey (if not the entire list - in an appendix).
The questions have been added to the article as an appendix.
The scalability evaluation, on the other hand, raises a series of questions. For example, in a real-world deployment (which has happened according to the paper) are those 20 concurrent connections enough? Testing the system using synthetic 'data' is great as it allows one to push the boundaries of the system's scalability. However, to complement this, it would also be interesting to analyse the real usage data. Finally, how is this scalability affected by the backend repository? The authors do mention that the evaluation has been performed only with one backend, however, it could have been interesting to already delimit the latency induced by the actual system in the context of the entire response time.
Regarding whether 20 concurrent connections are enough: an analysis of the Organic.Edunet log files showed that only few requests per minute are sent per user. This means that 20 concurrent connections (sending requests infinitely) to the backend correspond to several thousands of active users in the system. This will be more thoroughly analyzed in future projects where continous monitoring of highly active users can be performed in a larger scale (it cannot be done now because the intense metadata annotation phase ended together with the project). Tests with other backends are planned, especially in clustered settings. As mentioned in the future work section this will be topic of additional research and probably result in a new article.