Review Comment:
This paper introduces Morph-KGCstar, a materialization engine to generate RDF-star graphs from heterogeneous data sources. The system is implemented in python, built upon pandas, and is distributed as a Pypi library as well as a GitHub Repository with some documentation and examples of usage. The system is described in a detailed manner, providing relevant examples and figures.
Morph-KGCstar is shown to comply with the RML-star specification, by showing that it passes the defined unit tests. It is also clear from the text that arbitrarily deep statement quotation is possible to manage, in accordance with the RDF-star spec.
The authors compare Morph-KGCstar with two reification approaches and show that Morph-KGCstar produces fewer triples than the alternatives, being the fastest for one of the used dataset and the second fastest on the other one.
Comparison to SPARQL-Anything is also provided, which is the only other way to generate RDF-star graphs from heterogeneous data sources. In this comparison, the authors conclude that Morph-KGCstar can easily manage larger files; however, several small files are better handled by SPARQL-Anything, which is something the authors pretend to address in the future.
I have only a few small comments that the authors can easily incorporate in their final version:
1) Several times in the paper, the authors say that RDF-star is a way to annotate statements, or a means to provide reification capabilities.
This is correct, of course, but it sounds like it limits RDF-star to simply quoted triples in the subject position, which is not the case, as shown in Algorithm 1, Morph-KGCstar can manage any of the nesting cases valid for RDF-star. Thus, I would suggest that the authors use the current informal definition of the RDF-star spec, that RDF-star is "an extension to RDF to make statements about statements", or that the authors refer to this (annotating statements) as one of the use cases for RDF-star.
2) Algorithm 1 could be explained with a bit more detail. The parameters that the procedure receives, except the nesting level, are not discussed, therefore I don't know what m.OM or m.SM mean.
3) In Section 5.2.1 the authors claim that RML-star is the fastest approach. This is only correct for the case of SemMedDB, according to the data presented in Table 1. For the case of SoMEF, RML-star is the second fastest, being one order of magnitude slower than the singleton properties.
4) I understand that Listings 4, 5 and 7 are independent/parallel examples of the different reification strategies, however, I would prefer that the sets of triples produces were semantically and practically equivalent among them, which they currently are not. To fix this, you would need to assert the quoted triples.
5) In line 35, page 2, the authors say that an RDF-star triple can be placed in the subject or object of an RDF triple, which is true, but an RDF-star triple can also be placed in the subject or object of another RDF-star triple, which is more accurate and captures the recursive nature of RDF-star.
6) There are some small presentation issues, (e.g., line 19 of page 2: "a comlause and Apaon", or line 39 of page 3, where citation [19] appears twice in the same sentence), so I would recommend the authors to use an orthography and grammar checker.
Other than that, I think this paper presents a valuable contribution to our community, and I'd be happy to recommend for it to be accepted, considering the minor revisions above.
|