# RDF 1.1: Knowledge Representation and Data Integration Language for the Web

Dominik Tomaszuk
David Wood

Krzysztof Janowicz

Resource Description Framework (RDF) is seen as a solution provider in today's landscape of knowledge representation research. This survey outlines RDF, version 1.1, the W3C Recommendation for knowledge representation on the World Wide Web. In this article, we review and present works from RDF v1.0 and v1.1 implementations. We also provide insights on the reification, blank nodes and entailments. This article surveys current approaches, tools and applications for mapping from relational databases to RDF and from XML to RDF. We discuss RDF serializations, including formats with support for multiple graphs and we analyze RDF compression proposals. All approaches are presented in tabular format concisely and are grouped under a classification scheme. Moreover, the article provides an empirical study about usage of different RDF model elements as well as RDFS vocabulary terms. Finally, we present a summarized formal definition of RDF 1.1 and emphasize changes between RDF versions 1.0 and 1.1.
By Antoine Zimmermann submitted on 09/Feb/2017
 Suggestion: Reject Review Comment: This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community. Summary: ======= The paper presents a survey covering many topics related to RDF in general, and RDF 1.1 in particular. It deals with the abstract syntax, the formal semantics, serialisation syntaxes, compression, with issues on blank nodes, data integration, online usage. Main comments: ============= The paper achieves to collect a wide range of work related to RDF and can serve as a good source for students/researchers that want to get a reading list about this topic. In several places the paper reads much like a textbook rather than a scientific article, with very basic, informal presentation of the concepts. As such, it does not suit well the readership of the Semantic Web Journal (we can expect that they know a bit about RDF and the Semantic Web). Moreover, it does not even do a good job as a textbook because it is full of errors, ambiguities, vagueness where there should be precise, rigorous definitions. Moreover, most of the references are described very shortly and very little analysis, conclusions and insight is given. The paper provides statistics about the authors' own experiments, but these are mostly redundent with existing work that does that already. The last sentences before the concluding section only mentions more references and does not put anything in perspective. The conclusion of the paper is very short and shallow. So to summarise: - there is not real original research and new insight about the topic; - the paper is full of many problems, some of them quite serious, that I detailed below. Detailed remarks: ================ Sec.1: - it is said that RDF was a response to the problem of natural language being ambiguous and hard to process, but this not backed up with references. I dobt that it is what RDF initial aimed at. Sec.1.1: - "Our surcey serves to surface some of final state of design" -> it is not clear what this means - "analyze the RDF blank nodes" -> there is little analysis on this topic and nothing that has not been said before, especially in [92] - "to compare the RDF reification approaches" -> there is very little in this regard Sec.1.2: - the section is called "Related Surveys" but contains many references that are not surveys - ref.[111] should be "Draltan Marin. RDF formalization. Masters thesis. École polytechnique. August 2004." If the technical report of 2006 is cited, it should be "Claudio Gutierrez. A note on the history of the document: RDF Formalization, by Draltan Marin. Technical report. Universidad de Chile. 2006." - Jeremy Carroll provides a formal analysis of comparing RDF graphs in 2002. He proves that isomorphism of RDF graph can be reduced to known graph isomorphism problems and gives the complexity of it. This paper is not cited. "Jeremy J. Carroll. Matching RDF Graphs. In Ian Horrocks, James Hendler: The Semantic Web - ISWC 2002. First International Semantic Web Conference Sardinia, Italy, June 9–12, 2002 Proceedings. ISBN: 978-3-540-43760-4. Lecture Notes in Computer Science, vol.2342. Springer 2002. - In [116,117] ... . This paper" -> which paper, there are two cited - While the section puts a lot of references in minimal space, it does not analyse them at all Sec.2: - this section gives a very basic overview of the semantic web standards that are certainly not interesting to the readership of the Semantic Web Journal. - "An RDF constitutes ..." -> what is "an RDF"? - "... which in the RDF terminology, are referred to as triples (or statements)" -> this would only be true in informal speech or prose. In RDF terminology (that is, according to the RDF Concepts spec), subject-predicate-object are "RDF triples", nothing else. - Def.1: "assuming I is the set of [IRI] references" -> I should be the set of IRIs (not IRI references) - Def.2 is not a definition - "Note that in RDF 1.0 identifiers was RDF URI references" -> were URI references - "is an URI" -> is a URI - Def.3: this definition is wrong. Cf. RDF 1.1 Concepts - "simple literals" -> not defined - "RDF 1.1 supports the new datatype rdf:HTML" -< what does "supports" mean here? rdf:HTML is non-normative - Def.4: "Blank nodes are defined as existential variables" -> no. blank nodes are defined as elements of an infinite set disjoint from IRIs and Literals. The relation between bnodes and existential variables is that of the semantics of bnodes. That's how they are interpreted, not how they are defined (similarly, literals are defined as a lexical string with a datatype IRI, not as a value in a value space). - Def.4: bnodes are not used to denote anything. They indicate the existence of a thing and do not denote anything. They are not anonymous resources. They may indicate the existence of a thing that happens to be identified by an IRI or a literal - "Given 2 blank nodes, it is not possible to determine whether or not they are the same" -> if they are 2, they are not the same. Being the same means there is only one bnode. The way bnodes work in RDF 1.0 and 1.1 is the same as far as RDF graph and RDF graph serialisations are concerned. There is only an extension on how bnodes work in RDF datasets, a concept note defined in RDF 1.1. - Def.6 "so-called context" -> so-called by whom? there is no such notion as "context" in RDF 1.1 Concepts. - before Def.7, there is a little discussion about the semantics of RDF datasets, while there has been nothing said about the semantics of RDF graphs. This kind of remarks should come after presenting the formal semantics - Def.7 is ot complete. Graph names cannot repeat - The description of what RDF containers are is vague and misleading. It is not clear why there is a focus on RDF containers, a rather unimportant feature of RDF. - "Quite rarely used feature is reification" -> this needs a citation. - "There are other proposals [87,118]" -> there are other approaches (a good reference for this would be "Daniel Hernández, Aidan Hogan and Markus Krötzsch. Reifying RDF: What Works Well With Wikidata? In Proc. of SSWS 2015") - The section ends abruptly without transition or analysis Sec.3: - after Def.8, it is said that the graph isomorphism problem is GI-complete, which is correct. Then, Table 1 says it is NP-complete. - the source for GI-completeness should be Carroll 2002 as cited above. - Def.9 mentions common labels. There is no bnode labels in the abstract syntax and the sentence should stop with "do not share any blank nodes" - "The labels of blank nodes are bot of significance outside of the local scope RDF merge" -> what does this mean? - "than the original graph" -> graphs - Def.10: "a bijective function M: B-> B ... M is the identity map on RDF literals [etc]" -> it cannot be a function from B to B if it maps literals and other things - Def.11 does not make sense as formalised - before Def.12 "is a that" - Def.12: "Assume that a map is a function ... there is no function" -> why introduce the term "map" that is never used? Also, the "otherwise" part is not useful because it's a definition - after Def.12 "that the a subgraph" Sec.4: - Def.14: "Let V be a vocabulary" -> interpretations in RDF 1.1 do not depend on a vocabulary - after Def.14: what is H? what is I(H)? - after Def.16: "in RDF 1.0 [...] D-entailment was described as an RDFS-entailment semantic extension. In RDF 1.1 it is defined as a RDF's direct extension" -> No. It is an extension of simple entailment. - Def.17: there is a \mathcal{I} while the interpretation is just $I$. - "A selection of the inference rules" -> rules of what? - after Def.18: "in Table[newline]3" -> use non-breaking space Sec.6: - "Turtle (denotes ttl)" -> denoted ttl in Table 7 - "RDF[newline]1.1" -> non-brealing space - Sec.6.1, after Ex.13: "the syntax is viewed as problematic to read and write for humans so one should consider using other syntaces for data maangement" -> why is it a problem for data management? For RDF editors maybe - after Ex.14: "separated by space key, tabulator key" -> separated by space, tabulation - This section does not talk about TriX, RDF/JSON - Table 7 is not referenced in the text apparently (it seems that text after the table says "Table 8" but should read Table 7) - "below" -> above? - Table 7: why are rdfa and xml "human readable"? What can we conclude about the table? Sec.7: - Table 8: why is there a star in the cell "standard" for HDT? - "Several research areas have emerged around MapReduce and RDF compression" -> maybe cite "José M. Giménez-García, Javier D. Fernández, Miguel A. Martínez-Prieto. HDT-MR: A Scalable Solution for RDF Compression with HDT and MapReduce. ESWC 2015: 253-268" - "In Table 8 ... below" -> above Sec.8.1: - "public-lod@w3.org mailing" -> mailing list - "LOD Cloud.Each" -> space - "The Fig.3" -> "Fig.3" - Fig.3 is not useful. - "The Table 9" -> Table 9 - the data from Table 9 comes from somewhere, cite it Sec.8.2: - "we define the riib ratio" -> this comes from other research, cite it - "ddr_l metric for rdf:langString is almost 0" -> what is being computed here language tagged strings are quite common. If the computation is about how many times rdf:langString shows up explicitly, then the value should zero. There are no concrete syntaxes where one can explicitly write a language tagged string with its type being explicit. Sec.8.3: - "The results show that RDF*" -> what is RDF* Sec.8.4: - "Table[newline]17" -> non-breaking space At the end of Sec what do we conclude? What was the use of doing all this? Sec.9: there is only very little related to KR in this paper. It's mostly about data management, with a little section on reasoning. Nothing on modelling knowledge and related things such as ontology engineering, reaasoning algorithm, etc. Ref.[62]: missing capital letters on rdf
By Axel Polleres submitted on 15/May/2017