Review Comment:
The paper describes a set of quality criteria for assessing the quality of RDF literals and presents a toolchain for automatic analysis of such literals in web scale. The quality criteria take into account the syntax and the semantics of the RDF literals and define a set of measures for grouping them into a set of predefined categories. The tool chain is mainly based on LOD Laundromat with some integration with Luzzu. Further, based on the assessment several improvements are proposed.
The topic of the paper is relevant to the theme of the special issue and the paper is well written with an easy to read to structure. Though there are several metrics defined in the literature for evaluating the quality of RDF literals, authors claim is reasonable that there are no studies focused completely on RDF literal quality done in a web scale thus making the contributions novel.
Detailed Comments:
Section 2:
There are several metrics in the existing literature that are related to the quality of RDF literals. For example, the ones mentioned under syntactic validity, consistency, interoperability etc in [1]. An analysis of those metrics would enrich the related work section.
Section 4:
As the authors are formally representing the definition of literals from the section 3.3 Literals from the RDF 1.1 specification, I would suggest to put a reference to the relevant section of the RDF 1.1 specification. It might help the reader to understand why rdf:langString has to be considered as a special IRI and other details.
Similarly, section 4.2, semantics of literals seems to be built based on “Section 2.3 The Lexical Space and Lexical Mapping” of the XSD 1.1 Part 2: Datatypes specification. It might be helpful for the reader if that specification is referenced.
Figure 1 - I assume that each level should sum up to 100% (e.g., Supported 60%, Unsupported 40%) rather than having 10% in all nodes, isn’t it?
I wonder whether the introduction of the unimplemented category in the quality categories mixes different things. In the same case unimplemented, are we assessing the quality of the literal or actually some aspect of the RDF processor?
“Canonical literals are of higher quality than non-canonical ones because they allow identity to be assessed more efficiently”. It seems that canonical mappings are not easy to derive and not very useful in some situations. It would help the reader if this statement is further elaborated or an example is given.
Can the multiple quality criteria for LTS be related to the existing quality categories such that a set of unified categories are presented?
Section 5:
The tool chain section contains descriptions of each individual tool that has been used for the quality assessment of the RDF literals but it lacks the information about how they were used together as a chain to produce the results presented in the paper.
I quite like how the quality metrics are defined under 5.3.1 ~ 5.3.3. However, I would suggest those definitions to be moved to “Section 4.3. Measures for literal quality” and integrate with the content in that section. For instance, 5.3.1. Assessing the Datatype’s Compatibility is quite related to whether the lexical expression belongs to lexical scope of the datatype.
Further, the measures defined in 4.3 can also define a set of metrics similar to the ones in 5.3.
For single word literals, aren’t there more efficient ways of doing dictionary look ups rather than looking for a rdfs:seeAlso property in the resources returned by the Lexvo API? For instance, if I look for a word such as http://www.lexvo.org/page/term/eng/Canonicalization your approach will give a false negative. Do you know the precision and recall of this approach? I think this is somewhat justified as you define this metric as an estimate but I wonder whether there is more efficient way to do this doing an API call, a HTTP dereference, and RDF processing and query.
Section 6:
In Table 1, add a footnote or a note somewhere to say that the prefix dt is used for http://dbpedia.org/datatype/.
Similar to examples given in partially defined and non-canonical cases, it would be valuable to have a small discussion on the causes of invalid literals.
I assume with the results you have, Figure 1 can be reproduced in this section with the relevant numbers.
It would be useful to have a summary of the top datatypes used in the literals overall, for instance, similar to Table 4. That will give some perspective when reading Tables 1 ~ 3.
What was the reason only to use 470 data document from the LOD Laundromat with Luzzu? As the paper talk about web scale analysis, isn’t this number relatively small? According to the tool chain description, I assume these tasks were fully automated.
Figure 2 doesn’t provide much valuable information on the reasons behind low or high compatible datatype metric values. It needs further discussion and may be the figure 2 and 3 can be omitted if they don’t provide much information.
Section 7:
Does “4. Datatype IRI are regularly not resolved with respect to their RDF prefixes” mean that there are a lot of undeclared prefixes in RDF documents? That statement is bit ambiguous.
Outdated language tags are first mentioned in section 7. I assume they should be introduced and discussed in Section 5.3.2 and 6.2.
Though the analysis of different language processing libraries such as “How accurate are the language detection libraries?” I assume it deviates a bit from the main focus on this paper. I see this analysis more as “homework” for deciding the most suitable approach for detecting the correctness of language tags in RDF literals. Once it is done, the focus should be on automatically deriving those metrics on large number of documents rather than on a small sample. The paper contains large portion for the analysis of the tools with a lot of tables and figures but less information about the quality aspects of the language-tagged strings.
Section 8:
Throughout the paper it is not clear how LOD Laundromat and Luzzu are integrated in this work. It rather seems like two separate lines of work with minimal integration.
The conclusions seems to be a bit weak and probably can be improved by providing insights about the overall process of evaluation, challenges, etc.
General:
There are some formatting issues to be fixed. E.g., 1457568017 in page 8, {ToDo Wouter} literals in page 11
[1] Zaveri, Amrapali, et al. "Quality assessment for linked open data: A survey."Submitted to the Semantic Web Journal (2015).
|