Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia
Solicited review by Daniel Krause:
The paper has mainly two contributions, first a generic personalization framework that utilized Linked Open Data and second, an performance evaluation of real-world LOD repositories. My main criticism is that the paper does not clearly outline the advantage of the proposed framework against other relevant, but not cited related work. Thus, it is hard to evaluate the contribution of this paper.
Detailed issues:
- The related work section should be rewritten: I disagree with the authors in the point that RDF browsers are a current research issue. Known browsers, like Longwell, Piggybank and Brown Sauce are around for some years now. I further do not consider RDF browsers as related as they focus on visualizing existing data, but do not offer personalization.
- The related work should also consider a broader range of generic personalization frameworks, e.g. Fabian Abel, Ig Ibert Bittencourt, Evandro Costa, Nicola Henze, Daniel Krause, Julita Vassileva: Recommendations in Online Discussion Forums for E-Learning Systems
- I miss a comparison to GALE (GALE: A Highly Extensible Adaptive Hypermedia Engine, by Paul De Bra, ACM Hypertext, Eindhoven), which is the outcome of the EU project Grapple and does also contain a rule-based adaptation engine. Please also discuss what benefit has the proposed solution compared to GALE plus LOD integration?
- You mention to store user profile information as XML file. For, me it would be more straight-forward to store such information as RDF triples.
- The demonstrator is not described detailed enough. I miss examples of the mentioned general AS rules and a link to the demo.
- Regarding the performance evaluation, I would expect a much larger study, using different times for the measurement and different client network connections to distinguish between network-related delays and processing delays. It is also important to discuss cache hit and miss ratio in real-world trials. While your caching reduces access time significantly, I would assume that normally that the amount of LOD data complicates an efficient caching in real-world. In Figure 2, the cached request times are still 2 seconds in 4/5 cases. This is still a lot - do you have any explanation for that? Is a delay of 2 seconds still acceptable of a web application? And more important: Can the users use the application when the cache is not filled - or: Is it feasible for a user to wait 10 seconds per result page?
Minor comments:
- It is not common to mention the academic titles of the authors.
- p2: labelled -> labeled
- p4: domian -> domain
- p5: linkedMD -> LinkedMDB
- p6: reformat the SPARQL query. Every line should contain only one triple and end with the dot
- p6: leveragted -> leveraged
- p6: fig.5 -> Fig. 5
- p8: This was implemented IN order...
Solicited review by anonymous reviewer:
The paper describes a framework for adaptive hypermedia on the (open) Web model. The framework leverages on Linked Open Data (LOD) for providing the input data and its associated metadata. It employs SPARQL queries to get the data and its schema on which the adaptive hypermedia behaviour is implemented. The adaptive behaviour is realized by means of rules (that follow a certain strategy) and a user model that stores the user dynamic interests. In the second part of the paper, the authors discuss performance challenges when employing LOD: the need of caching to speed up the execution time of queries when response time differs for LOD sources.
The idea of opening adaptive hypermedia systems to previously unseen content is not new. What could have been new is to provide a detailed description of your Web-based framework (the specifications of your components and languages), but unfortunately this is not achieved. It remains unclear how the adaptivity rules are designed if one does not know the metadata (as this is discovered at run-time); examples would help the reader understand the proposed rule language. The same can be said about the user model, how do you know what is relevant for the user if you do not have at design-time access to the data schema? Again user model examples would help the reader grasp the proposed methodology. What information represents the adaptation strategy and how is this used in the adaptive engine? A possibly remedy to the above problems is to use the personalized movie case as a running example when you present the framework.
Making use of LOD to bootstrap adaptive hypermedia can be an interesting contribution but without presenting the adaptivity and user models, the paper remains at a too abstract level to convince the reader on the general applicability of the proposed solution. The second part of the paper, instead of detailing on the framework internals, jumps to a new topic, previously announced, on the performance challenges that one needs to overcome while using LOD data. While this is definitely useful, it should be considered only after the framework details are carefully presented and identifying the specific components on which these performance bottlenecks apply.
It remains unclear why SPARQL query optimizations are not discussed. These can also have the potential to reduce the execution time of data-intensive queries. Regarding the existing approaches related to open adaptive hypermedia the authors should discuss Web design methodologies as SHDM, Hera, OntoWeaver, etc., which also make use of external semantically annotated data to provide hypermedia adaptivity on the Web.
Other comments:
-throughout the paper "Semantic Web" instead of "semantic web"
-page 1: delete "Dr." and "Prof." as prefix for authors names
-page 1: "an Adaptive Hypermedia" instead of "a Adaptive Hypermedia"
-page 1: "identify" instead of "identifying"
-page 1: "an architectural description" instead of "a architectural description"
-page 2: "LOD resources" instead of "LOD Resources"
-page 2: delete ", below"
-page 2: "AH systems" instead of "AH Systems"
-page 2: "etc. [13]" instead of "etc [13]"
-page 3: sentence "Longwell [43], and RDF power highly-configurable browser" misses predicate
-page 3: "manipulating" instead of "dealing with"
-page 3: please note that the Semantic Web can be used for personalization, but not only, other applications are for example integration and knowledge discovery
-page 3: "," after such a degree"
-page 4: "." after "[26]"
-page 4: why is the user mode in XML and not RDF as your framework already uses many Semantic Web technologies?
-page 4: "domain" instead of "domian"
-page 5: "its very nature" instead of "it's very nature"
-pages 5 and 6: Figures 2 and 4 are hardly readable (please use the width of the two columns to show these pictures)
-page 5: sentence "As all of the computation …" misses predicate
-page 5: "its source" instead of "it's source"
-page 6: align one triple per line in Fig. 3 to increase readability
-page 6: "two evaluations" instead of "two evaulations"
-page 6: I think it should be "quantitative level" instead of "qualitative level"
-page 7: "." after "JSON"
-page 7: seems that you have a contradiction between "ten times" and "three times"
-page 7: not clear what you mean with "modeling differences can make equivalence only approximate"
-page 7: "difference between" instead of "different between"
-page 8: sentence "A caching mechanism for the results of SPARQL queries" misses predicate
-page 8: "in order to improve" instead of "order to improve"
-page 8: not clear what you mean with "However caching of external additional data helps build a tolerance of external faults and thus improves the robustness of the system", which are these external faults?
-page 9: not clear what you mean with "It is important for future architectures for Open Model AH, and LOD in general, to be resistant to variable responses and which are able to avoid excessive load on LOD repositories", what are these responses?
-page 9: pages and publisher missing for reference [2] (if it is only published online available please give url)
-page 9: what are the pages and the book title for reference [9]?
-page 9: pages and publisher missing for reference [11]?
-page 10: why is "THE SEMANTIC WEB" in capitals for reference [24]?
-page 10: pages and publisher missing for reference [31]
-page 10: delete "[Publication]" for reference [36]
-references are poorly formatted and missing information
Solicited review by Julien Subercaze:
Motivated by the fact that importing external resources in Adaptative Hypermedia Systems is highly costly in terms of metadata generation, the authors propose to use Linked Open Data in an automatic way, to reduce the human intervention in the process.
This approach is natural, and the authors identitfy key issues. However the technical quality of the paper is too low to recommend it for acceptance. The paper mixes different issues without adressing any in a satisfying manner. I would recommend the author to focus on one precise issue and then detail their approach.
----
The architecture is very straightforward and I could not clearly where the contribution lies in this part. The most interesting parts : adaptation strategy and user model are not described in this paper. These parts are clearly the added value of the approach compared to the tons of applications that already do mashups with linked open data, web services and so on.
Is the User Model generic ? Same question for the adaptation strategy ?
The given example in section 3.2 seems to indicate the contrary. However regarding the given information one cannot conclude.
Some technical choices are given as is, but not well justified. Why using a JBoss rules system and not a SW one like Jena for instance ?
A user evaluation of the presented implementation (personalized movie browser) is required, even with few users, to validate it.
------
The performance issue is one problem in itself and could be enough for one paper. This part should not represent half of the contribution part without being indicated neither in the title nor in the abstract. I was very surprised while reading the paper.
Concerning the performance issue, most the LOD are available for download :
- LinkedMDB : http://queens.db.toronto.edu/~oktie/linkedmdb/
- Freebase : http://download.freebase.com/datadumps/latest/
- DBPedia : http://wiki.dbpedia.org/Downloads37
So why not consider the approach of having local versions that are frequently updated. It would be interesting to consider the importation time and compare the gain with remote/ remote caching queries.
---
English quality : very pleasant to read. Just a typo on page 4 right column domian instead of domain.
--
As a conclusion I would advise the authors to leave aside the speed of web study and to focus on detailing their model and architecture. The idea seems promising and I was frustrated of the short description of the central contribution of the paper.