Review Comment:
I appreciate that most of my comments have been addressed.
Just a few minor issues:
(1) the research question on p. 2 is too broad, as it refers to "quality of existing data on the Web" - this should be narrowed down to "Linked Data"
(2) on p. 15, the authors state that they consider a term from another vocabulary "if a property or a class refers to an existing term in another vocabulary" - this might be a bit picky, but as long as you do not attempt to derefer the term, you should omit the word "existing" ;-)
(3) I am still not fully convinced by CS9. Many datasets mix terms from different vocabularies. For example, if someone assigns a foaf:Person as a dc:creator of a swrc:Publication, this will be a violation of the metric as defined by the authors. Hence, I do not think that the approach of simply checking for supertypes is a suitable proxy for detecting incorrect domain and range types, and will likely underestimate the quality of a dataset.
(4) In section 6, the authors should also show correlations.
While (1)+(2) are fairly easy to fix, (3) might be a little tricky. One option would be to load the statement, the subject's and object's types, and the vocabularies of the subject, object, and property, plus the transitive set of imports, and then check for consistency (using a very a minimalistic A-box instead of the entire A-box, as we did, e.g., in [1]). Since the metric is computed based on a sample only, this should be fairly well feasible. Even in a very pessimistic scenario where checking a single statement takes one minute (it should actually be less than that for most vocabularies, which are fairly small), this would not take more than a week.
As the authors mention correlations in section 6, it would be good to also see a correlation matrix for the metrics, e.g., in the form of a heatmap visualization. This would answer the question of whether metrics are correlated or not in the most straight forward way.
[1] Paulheim and Stuckenschmidt (2016): Fast Approximate A-Box Consistency Checking Using Machine Learning
|