Review Comment:
The paper provides a review and performance comparison of the common RDF datastores that can be considered for storing and querying RDF statements that are saved along with spatial, temporal, or spatiotemporal data, apparently with a focus on data derived from sensor networks, which is a relatively novel idea, although it needs more justification.
Storing RDF data with spatial and/or temporal data is somewhat oversimplified in the paper, and there are no mentions about the challenges associated with the extension of the standard RDF data model, nor about capturing the semantics of spatial and temporal terms as well (not only property values, and definitely not as string literals). It is not mentioned that adding spatiotemporal data to RDF statements can lead to undecidability when reasoning over the knowledge base.
Regarding the statement “the value with a timestamp is indexed as an RDF literal value” (in Virtuoso), it is not explained how the type and quality of the captured data affect machine-processability and machine-interpretability. A well-known limitation of the standard RDF data model is the incapability of capturing metadata and related data, such as provenance and spatial and temporal data at the statement element level and at the statement level. How do the authors justify using RDF for storing sensor data that requires spatial and temporal data to be stored along the statements? Using a formalism formally grounded in description logics for capturing spatiotemporal data is not justified; it should be emphasized that the benefits of the RDF data model are inherently exploited this way, however, doing so introduces some undesirable side-effects.
No formal definitions are provided for any of the concepts. The various (sometimes proprietary) indices of the RDF triplestores and quadstores discussed in the paper are not explained, and it remains unclear how the context element of RDF quads is used for storing spatial and/or temporal data. When it comes to spoc, posc, and opsc, no explanation is provided. By having quads with a context element, how and where is the semantics defined for the context? How do we know when you use it for temporal and when for spatial data? What happens if we need both spatial and temporal data simultaneously? How is spatiotemporal data stored by the context element? Do the authors consider RDF statement-level spatiotemporal data only and why? Without explaining this, the significance of the results cannot be assessed.
As for section 3, another big challenge (beyond the mentioned ones) is to retain decidability when reasoning over metadata-enriched RDF statements. Also, the capturing itself requires non-standard solutions that should be at least backward-compatible with standard RDF triples or quads. It is not mentioned anywhere what are the implications of diverging from standards to store spatiotemporal data, nor any alignment with alternatives to RDF reification and n-ary relations, such as for extending the standard RDF data model (e.g., RDF+, SPOTL, and RDF*), extending the RDFS semantics (Annotated RDF Schema, G-RDF), using alternate data models (e.g., N3Logic), decomposing RDF graphs (RDF molecule), capturing context with each statement (e.g., named graphs, RDF triple coloring), and using external vocabularies and ontologies (OWL-Time Ontology, 4D-Fluent Ontology, SWRL Temporal Ontology, etc.). These are at different levels of abstraction, and all have their own strengths and weaknesses.
In 5.1, a link to Virtuoso should be added as a footnote, similar to the other tools discussed in the manuscript.
In Table 1, since there is a dedicated license column, it would be more useful to provide actual licenses, rather than indicating a commercial vs. open source license type. The latest release date column does not add to the discussion and will obsolete quickly, so it can be omitted.
There are writing inconsistencies, such as using two versions of the same word in the manuscript (indexes and indices). Between a section heading and a lower-level subsection heading, such as 7 and 7.1 or 8.1 and 8.1.1, there should be at least one sentence. There are some typos, such as “benchmark queries set” instead of “benchmark query set,” “firstly define” instead of “first define,” “data for all over the world” instead of “data from all over the world,” “Virutoso” instead of “Virtuoso,” space before the full stop at the end of a sentence, etc. Some sentences are hard to read and should be reworded (e.g., “This reason is also explained for the poor data loading performance” should be “This can also be the reason for the poor data loading performance” or similar).
Regarding the case study, it is not clear what kind of spatiotemporal data is stored exactly and how. Only querying examples are provided, but a concrete example that would demonstrate the storage of spatiotemporal data-enriched RDF statements for data derived from real-world sensor networks is missing completely.
Apart from a 2018 and a 2019 article, the References section include older papers only. More recent papers should be cited. Under References, everything is lower case, which is incorrect (“rdf” instead of “RDF,” “Dbpedia” instead of “DBpedia,” “sparql” instead of “SPARQL,” etc.). This should be corrected throughout. DOI numbers are missing.
|