Review Comment:
The paper is much easier to read. However, I'm not completely sure this isn't because I've now read it several times. There are still sections that are a bit difficult to interpret. The inter-relations sections in general state what is easily inferred or highlight fairly thin connections. S4.5 especially is a bit difficult to follow, and I'm not too sure really contributes much to the discussion. Most of S5.2 and especially the definitions of quantitative vs qualitative are probably not required, even for anyone who might not know what they are, they are easy enough to interpret within the context.
The point of the examples is not always easy to decipher, and sometimes actually raise questions rather than answer them. E.g., the flight code in 4.2.2 & 3 left me wondering if the error could be correctly raised - maybe this is a case where leaving it to an inter-relations section would work. The authors could then illustrate more clearly where even if one criterion did not recognise a quality issue another used in conjunction with it would.
In 4.1.4 the example doesn't make sense - a spoof competitor with HIGHER prices won't entice people away from the real one. 4.4.2 - the date example is a bit far-fetched, differences in formatting for dates is a known issue, the example given is actually one of the simpler encountered, and any basic application should be able to handle it and should probably anticipate such. Also, what exactly is meant by using LD principles to provide data (wrt time/dates) - which principles, and to correct what, using XSD or the time ontology?? Does either XSD or the ontology violate LD principles?
Importantly, my key reservation still remains - a survey should cover a broad range of existing work. I'm not completely convinced the infancy of LD is the only reason for the small number of articles found. It may be that:
- 1. - there isn't enough distinction between LD and other structured data, esp as stored in databases, from this point of view, to justify new research specifically for LD
which leads to:
- 2. - there simply isn't enough yet to warrant a survey for the field. Note I still believe the paper is timely, but it may be that at this point what is possible is a review of nascent work or a proposal to guide research in this area, rather than a complete survey.
I'm actually quite surprised that the authors didn't look for new papers in the interim for the reviewed version, i.e., 2012+. Granted, it is more work and moves the goal post. But this would only be positive, and would widen the scope, even if only by a few more papers. And it would make the paper more current, which is important considering we all agree the field is nascent. Further, a good survey may be considered seminal work, it needs sufficient breadth to guide new research and further work.
Related to the point above, while more of the 21 are referenced in the quality descriptions/definitions, they still mostly start with "Bizer [6] adopted the definition of [QualityCriterionX] from [AuthorsY] …". My original point has not been addressed. I'll try to illustrate more fully why this is an issue. Note, as in my initial review, I am in no way discounting how much work goes into a PhD thesis. However, it is, if I must be pedantic, not in the normal sense peer-reviewed, it is examined. Also, theses are not normally classified as regular publications, unless they have been independently reviewed after the fact and published. So the point that a survey should not rely predominantly on a thesis still holds. The way in which it is mostly cited actually strengthens my point - if it can only be cited indirectly, and needs to be substantiated by whoever else its author was citing, then the authors of this paper are actually saying that it cannot stand on its own. Also, large sections of the paper start to read a bit like a review of the thesis.
Further, citing in this way isn't simply unusual but introduces unnecessary complexity. If I were to cite this paper, then, do I write "[AuthorsZ] adopt the definition of AuthorsA, who adopt the definition of…" ad infinitum. Of course, a review of someone else's book on, say, Newton's law of gravity WOULD talk about how AuthorX analyses Newton's work. But this is not the case here.
Finally, the fact that the authors actually cite Pipino et al - one of the examples of indirect references - on its own (S4.3.1) indicates that it could have been used without prefacing it with the thesis in all the other sections.
Strangely, the Master's thesis is actually cited in and of itself!
The availability of (detailed) documentation is not a good measure of usability, if a tool is usable you should not need the documentation for any but rarely used functions or especially complex tasks. Ease finding help (within the documentation) may be a better measure (but still not a particularly good one). Wrt to the response to this point, at least state that this is what you did. Simply reading documentation written by someone else is simply not enough. An expert review - where the expert is an HCI or usability specialist, and has specific training and experience to carry out a heuristic evaluation using documentation and a tool's UI, will still have to follow a set of heuristics in doing so. But such results are still always presented with the caveat.
(Still) A large number of basic grammatical errors which an auto check and proof-read would pick up. A handful of contradictory statements, e.g., 5.4 licensing says all tools have a license, then ends by listing those that don't.
************** additional questions raised by authors' response
R: "We would like to point the reviewer to the comprehensive survey done by Batini et. al.(ref [2]), which already focuses on data quality measures for other structured data types. Since there is no similar survey specifically for LOD, we undertook this study. ..."
***So clearly state this in the introduction, with the reference. Simply because while the paper IS timely, the (range/coverage of) supporting literature leaves a lot to be desired. (but see also above)
A: ...This information has been added to the Introduction (2nd last paragraph). In fact this point also answers the reviewers issue about our inclusion of very few articles.
*** Actually, no, it doesn't. Stating this in the introduction DOES help, but it doesn't change the fact that the number of papers referenced is quite low. The bigger problem really is the coverage/range - that even out of the 21 the focus is still on just two, and those two not regular publications.
In fact the response saying Batini et al already have a detailed survey for structured data, and explaining why most of the review cites predominantly the PhD and Masters theses, actually highlights the issue - that the coverage is too low.
"It would be useful to indicate in Table 8 which of the three groups described in S5.1 each dimension belongs to - mapping the list of numbers in the text to the columns is unnecessarily tedious. "
*** The response doesn't address the question - in the text there is a list of numbers, in the table (in fact, the same is done in Tables 7 & 9) - author names used. Simply placing the (ref) number after the author name (in the tables) would resolve the issue. As is, the reader has to go back and forth between the text, the table and the references to match the numbers to authors and the information in the tables.
"What criteria were used to select the initial set of articles...
R: "The inclusion and exclusion criteria specified ... are detailed in Section 2. A reference has been added to the Introduction.
There is no reference where this is first introduced."
*** The point is that YOUR response to the question (initial review) stated you had included a reference in the intro. I could not find the reference where first introduced.
A: We agree that some publications provide a list of standardized keywords. However, if the keyword is in a list of standardized keyword, there are very high chances that that keyword is also present in the abstract and thus show up even with kewyords based search. Besides, the ACM Digital Library is just ONE example which provides such lists. We, however, used five other search engines/digital libraries and four journals where there is no such list provided. Thus, in order to standardize our search criteria over all the search engines and journals, we use the same search strategy.
Additionally, we are sure that we have included *all* the relevant articles for our survey.
*** ACM was one example out of the lot. Out of your own list the majority, if not all, have a requirement to include keywords, and some of these also have a predefined list of standard keywords. The argument about keywords appearing in the abstract only holds sometimes. Lowering quality, which is what this paper aims to countermand, is probably not the best solution in this case. While it is probably too late to do it here, it may have been preferable to use two separate searches if required, narrowing where necessary for the web search. This, from the paper, was a supplementary search, so narrowing at this point would not be a massive issue. Narrowing right from the start is.
The claim you are sure to have found ALL papers is a bit grand, with the number of journals, conferences and especially workshops out there today, even with more general criteria not absolute, with the search criteria used, even more unlikely. It is not unusual for authors of papers to contact those of a new one to identify similarities between their work. Especially for surveys.
A: As mentioned in the previous answer, since we wanted to include all the tools that the core articles in our survey propose, we include the ones that are not available too. About determining whether it is customizable, it was mainly done by the documentation of the tools.
*** then state this.
*** re -Maintenance/last update - state clearly then to the author how they are expected to interpret this. It is ambiguous at best.
*** I don't understand the point in LinkQA as an example here, if you cannot even choose the datasets what is it evaluating? How will it help me follow the quality criteria/guidelines proposed?
*** The authors highlight where only one of their references defines a specific metric, this starts to get repetitive. This is obvious where done, it should be stated at most once. Also, e.g., in 4.1.3 - rather than repeat the same reference 5 times for the sub-points, it could simply be placed at the top of the list.
*** Fig. 1 - for consistency and also readability - suggest move the final two numbers outside the boxes, as is done for all the others, it took me a while to finally locate them, even knowing they were supposed to be there.
|