Can we ever catch up with the Web?

Paper Title: 
Can we ever catch up with the Web?
Authors: 
Axel Polleres, Aidan Hogan, Andreas Harth, Stefan Decker
Abstract: 
The Semantic Web is about to grow up. By efforts such as the Linked Open Data initiative, we finally find ourselves at the edge of a Web of Data becoming reality. Standards such as OWL 2, RIF and SPARQL 1.1 shall allow us to reason with and ask complex structured queries on this data, but still they do not play together smoothly and robustly enough to cope with huge amounts of noisy Web data. In this paper, we discuss open challenges relating to querying and reasoning with Web data and raise the question: can the burgeoning Web of Data ever catch up with the now ubiquitous HTML Web?
Full PDF Version: 
Submission type: 
Other
Responsible editor: 
Krzysztof Janowicz
Decision/Status: 
Accept
Reviews: 

Review 1 by Martin Raubal:
A very good and informative statement of current challenges regarding linked data. You should include some reasoning after stating "in large parts this ideal is impossible to achieve". Why exactly?

I like the 3 posed challenges. The first one should be titled "1. Too Little Linked Data". Regarding the first example: you may want to mention that this query also involves privacy issues with respect to the available data of 'my friends'. 3rd para: "a perfect fit with"; ad "re-use of vocabulary terms" and following: this could be a nice cross-reference to the paper "Preventing Interoperability Problems Instead of Solving Them." There is a small formatting problem in para 6 (Although);

ad 2. Linked Data Quality: You describe the problems well but I am missing any solutions. You should add some description / suggestions of possible solutions at the end of this section.

Overall a good paper that states important issues.

Review 2 by Andreas Hotho:
The paper discusses nicely the upcoming issues around the Web of Data. The paper starts with a short introduction of the ideal version of linked open data and focus then on the three topics: too little data, the data quality and to much data. Along this line current shortcomings of semantic web technology for large scale web like application are identified and discussed along illustrating examples. Additionally open research direction are shown.

Overall the paper is well written and good to read. It nicely discusses the main issue of this topic and connects them to other research areas. I have only some minor suggestions.

In the introduction the idealized world for linked open data is mentioned. I'm not sure if this idealized world does make sense as a very important part is missing: The uncertainty or the probability of some information/link etc. which partially leads to the problems discussed in the second part. As people are working on this I think this topic should find its way into your discussion.

In the same direction goes the next comment for sec. 2 where the issue of increasing inconsistency is mentioned. I would like to see ideas for a solution of this issue in the paper as I think this is one of the major problems for scaling up the linked data idea. Linking more and more data together will automatically lead to an increased amount of inconsistency. There must be a solution or a direction for future work not only for reasoning with it but also for any other operation on this kind of data.

I miss the topic of user generated data and its integration into the cloud. While I believe this is easily possible the problem of subjective views on all kind of data is not discussed. As more user providing data they expect that any system will deal with them appropriately.

The issue of the data quality could partially be solve if incentives are offered to user. As the Web 2.0 has shown user are willing to contribute but only if they get something back immediately. Any comments here?

In section 3 data warehouses and information retrieval techniques are mixed a bit. I think most of the large scale rdf stores using some kind of IR index and if some kind of reasoning is involved it is somehow connected to database technique. Could you please make this clear in your paper. In the same context additional challenges are mentioned. Could you please add some explanation why the mentioned entity consolidation, reasoning and querying are specific challenges in this context.

Tags: 

Comments

Stimulating and controversial, like a position paper should be.

There's one thing which could perhaps be made more explicit (I think it's there, but only between the lines): What is the role of (formal) semantics from the author's perspective?

In the introduction, it is stated that "The knowledge of such an idealised world could then be represented in one huge RDF graph. Emerging standards such as OWL 2, RIF, and SPARQL 1.1. subsequently allow us to reason with and ask complex structured queries on this data..." This statement, as phrased, reminds me of (failed) attempts, a quarter of a century ago, to put all world knowledge into one database. The Web of Data should certainly not be interpreted this way (and I guess the authors agree - and may want to reword this passage). But then, if it's not to be understood as one huge knowledge base - then what is the role of the semantics which languages like RDF (and OWL) are endowed with? There are several possibilities how to rectify this, and it would be interesting to know the authors' perspective on this. It may be related to the authors' work in [12,22] - some more info would be great. (What about owl:sameAs? Should it be used? Should it be reinterpreted, semantically?)

Two side remarks:

Regarding the first paragraph of section 1, you may want to take note of

Prateek Jain, Pascal Hitzler, Peter Z. Yeh, Kunal Verma, Amit P. Sheth, Linked Data is Merely More Data. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, Deborah McGuinness: Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82-86. ISBN 978-1-57735-461-1. Proceedings of LinkedAI at the AAAI Spring Symposium, March 2010.
http://knoesis.wright.edu/faculty/pascal/resources/publications/jain-hit...

Concerning your remark at the end of section 2 on inconsistency handling - for an approach to paraconsistent OWL reasoning which has a clear model-theoretic semantics, see

Yue Ma, Pascal Hitzler, Paraconsistent reasoning for OWL 2. In: Axel Polleres, Terrance Swift (Eds.), Web Reasoning and Rule Systems, Third International Conference, RR 2009, Chantilly, VA, USA, October 20009, Proceedings. Lecture Notes in Computer Science Vol. 5837, Springer, pp. 197-211.
http://knoesis.wright.edu/faculty/pascal/resources/publications/paracons...