Making the Web a Data Washing Machine - Creating Knowledge out of Interlinked Data
Review 1 by Claudia d'Amato:
The paper analyzes a set of research challenges to make the initial success of the Linked Data paradigm a world-scale reality and suggests several research approaches to cope with these challenges.
In the following some detailed comments are reported:
* end of sect. 1: an example could be added
* beginning of sect. 2: the idea of integrating schema mapping and data interlinking algorithm
should be extended and particularly the role of schema mapping should be specified
* is it possible to sketch some proposals for solving the three challenges listed at the end of sect. 2?
* sect. 3: Machine Learning methods are usually grounded on the closed world assumption, differently
from the semantic web setting where the open world assumption is adopted. Some comments on the
possible customization that this difference would require could be interesting
* beginning of sect. 4: "For interlinking and fusing as well as for the classification,
structure..." -> please make explicit the classification of what
* sect. 5, 3rd point in the list: which is the goal of applying machine learning techniques? what
has to be refined?
* sect. 6: "The main issues of integration are the use of different identifiers for the same thing
and diversity in units of measure" -> is it possible to sketch some solutions to this problem?
* end of 1st column pp. 4: please motivate why centralized and top-down approaches are not adequate
for European governments and public administrations
* beginning of 2nd paragraph sect. 2: "The value of a knowledge base" -> "The usefulness of a
* end of 2nd column pp. 2: "On the Data Web users are not" -> "On the Data Web, users are not"
* beginning of sect. 6: "Enterprise information integration...need for integration" -> this sentence
should be rephrased
* middle of sect. 6: "Classification, application of...disruption to infrastructure" -> this
sentence should be rephrased
* end of 2nd column: "references from the the Data Web" -> "references from the Data Web"
* middle of sect. 7, 1st column: "..be it supra-national..." -> "be" does not seem to be correct
* end of 1st column pp. 4: "of Europe this will be a very challenging due to" -> "of Europe this
will be very challenging due to"
Review 2 by Rinke Hoekstra:
The author describes a vision of the data web where multiple different challenges interact to improve the quality and interlinkage of the data available. This is a nice analysis of the problems facing current linked data and semantic web research, and the idea of the web as a 'Washing Machine' for linked data is very appealing. Perhaps that should be in the title?
One thing that is mentioned in the paper is provenance. This lets me wonder whether the author should include the more technical challenge of being able to represent provenance related information in a transparent fashion on the linked data web. I can imagine there are several more core technical challenges to overcome before this is a reality.
The second half of the paper tries to explain how this paradigm can be implemented in several use cases, service oriented architectures (deployment of linked data on intranets) and for government data. This part of the paper is slightly less well developed, and could be a bit more to the point as to how the washing machine metaphor can improve the application of the linked data approach in these areas.
There are some typos and minor issues:
* p.3. 'needs continue to grow, mergers' -> 'needs to continue to grow. Mergers'
* p.3. The statement about doubling warehouse sizes could use a reference
* p.3. 'entail substantial' -> 'entails substantial'