Foundational Patterns Benchmark

Tracking #: 2452-3666

Authors: 
Jana Ahmad
Petr Křemen

Responsible editor: 
Guest Editors Web of Data 2020

Submission type: 
Full Paper
Abstract: 
Recently, there has been a growing interest in using ontology as a fundamental methodology to represent domain-specific conceptual models in order to improve the semantics, accuracy and relevancy of the domain user query results. However, the volume of data has grown steadily over the past decade. Therefore, managing, answering user's queries and retrieving data from multiple data sources could be a significant challenge for any enterprise. Thus, in this paper, we describe the foundational queries benchmark using the unified foundational ontology (UFO) and discuss how foundational queries help in optimizing the query answering results. For evaluation, we tested the foundational benchmark in different data sets -- generated and real world -- and on different triple stores.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 18/May/2020
Suggestion:
Major Revision
Review Comment:

This paper describes queries benchmark using the unified foundational ontology (UFO) and discuss the optimizations they help reducing during query retrieval. The authors use two different datasets (synthetic and real word) on three open source RDF stores. The paper is easy to understand and to follow, along with the experiments.

The authors propose a foundational query benchmark meaning that the SPARQL queries are optimized for datasets compliant with the UFO ontology. (P.6) It is missing a clear definition of a compliant dataset. It would suggest to add the key requirements for a dataset to be compliant with the UFO (either SHACL validation rules or integrity constraints (see sample here for the QB vocabulary https://www.w3.org/TR/vocab-data-cube/#wf)

P.8 - It is difficult to understand the use of two named graphs that seem to be the classes of the UFO ontology (Listing 1,2]. Could you explain how do you differentiate between those 2 named graphs when loading the datasets in the RDF Store?

You list 16 queries corresponding to the foundational queries. How do you assert or validate that they cover all the possible cases? Please, explain. Additionally, add the namespace used for benchmark: in the queries.
Is grouping Perdurant statements in a single group identifier has some implications in the results of the queries? Regarding the choice of the RDF stores, if we suppose you have a list “popular RDF stores”, why don’t you include Virtuoso which is used as backend of many linked open datasets such as DBpedia?

The results using UFO-indexing show clearly the optimization described in one of your previous work. However, the results of the RDF stores without optimization do not give any new information w.r.t. the start-of-the-art while comparing those 3 RDF stores. Do you have any differences compared to the results obtained in the previous works? If so, please highlight them.
In the discussion (section 8), you need to better explain why you had good performance when querying real world dataset. Have you considered the relative small size of the dataset ?

== Suggestions ==
Add units in Figures 4, 6 and 8. The same applies for Figures 11, 13 and 15.
Explain also the goal of computing the standard deviation during the experiments. What can we learn from those values in Figures 5, 7 and 9?
Why do you use Q1’ in page 14?
The claim that GraphDB free does not support FILTER with NOT EXIT should have a reference to assess it.

=== typos===
data-set/data set → dataset
Table 3 - typo in pattern formalization P13. I guess you it should be (?e1) →(udo:inheres-in(?e1, p1). Please, double check.

Originality: marginal but the UFO-indexing aspect used during the bench is the key aspect.

Significance of the results: Not sure if the results tell us more on the differences between the 3 RDF stores, or just that it measures how good is the UFO-indexing approach.

Quality of writing: Understandable but need some improvement. For example, unclear understanding of UFO ontology vs foundational queries. There is this misunderstanding when using the Aviation Safety Ontology both as both T-BOX and A-BOX. This is sometimes confusing for the reader. This should be clearly explained in the paper to avoid confusion.

Review #2
By Enrico Daga submitted on 21/May/2020
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The article proposes a SPARQL benchmark designed for datasets supporting the Unified Foundational Ontology UFO. The Foundational Patterns Benchmark provides an approach for generating (quad) patterns of SPARQL queries based on the top-level distinction between endurants and perdurants and their possible relations. These foundational patterns are then combined to generate more complex, but still meaningful, queries relying on query features such as UNION, OPTIONAL, and FILTER. The framework is used to compare the performance of three triple stores (RDF4J, GraphDB free edition, and Jena TDB) against two datasets. The first dataset is automatically generated from the UFO ontology, experiments are performed at a different scale from 200k triples to 1M. The other datasets come from an ontology and datasets built on UFO, the Aviation Safety Ontology and dataset (~26k triples). Experiments also include variants applying the UFO index, that improves the performance of query executions on UFO data, as demonstrated in a previously published work.

* I find interesting the idea of developing a benchmark from a foundational ontology. In particular, the idea of generating foundational patterns is appealing, and the strategy of combining them an interesting approach for obtaining variety.
* The related work section includes relevant research on SPARQL benchmarks, the authors contribute a foundational pattern benchmark to be used to evaluate triple stores against datasets developed on UFO.
* The article is well-written and includes many examples of queries and a complete report of the results of the experiment.

* I don't understand the motivation behind the work. I particular, how the foundational ontology would optimise "ontological queries" by using common sense knowledge (Introduction). Certainly, the distinction between Endurant and Perdurant is all but common-sense. Also, what is an ontological query? Here, the authors focus on SPARQL queries. However, I don't think the authors' point is that using foundational ontologies we can get more efficient SPARQL queries. Maybe authors refer to a conceptual clarity or ontological correctness?
* The benchmark can be used with all datasets compliant with UFO. However, the authors mention only 1 dataset (the aviation safety ontology). Are there other examples of datasets built over UFO?
* The motivation stated at the end of the related work section doesn't really fit the evaluation done. There, the authors claim that the foundational patterns benchmark has the purpose of optimising "SPARQL queries execution of triple stores". In my view, this would imply the comparison of queries generated with the proposed benchmark with other queries produced by humans, thus demonstrating the value of the foundational pattern benchmark. Instead, the evaluation section merely compares triple stores, not query sets.
* The authors claim that generated queries are the ones that users are interested in, as they match "people thoughts and language" (Section 6). I believe this claim should be supported by a citation (if this aspect was evaluated in previous work) or changed. For example, authors can certainly claim that the generated queries are generally meaningful but others may exist as well that are not derivable by the sole combination of meta-level features (perdurant/endurant relations).
* The evaluation refers to 3 queries but I suspect that combining the foundational patterns would lead to a large number of queries. Where are the others? Why those three?
* The difference in performance across the triple store seems negligible - we are not learning anything here.
* Queries are executed three times. An average of three executions may not be very meaningful. Maybe repeating the experiment 10 times would give more robust results.
* All queries run below the second, so the difference is not really significant, one may live without UFO index with no particular problems. If authors want to demonstrate anything on performance, they should experiment in a setting that shows significant differences, for example, with 10M triples? Or maybe those queries are not problematic in general? For example, the size of the result set have typically a significant impact on query execution performance, but this aspect is not given in the paper.
* The evaluation refers to a number of "frequently executed SPARQL query" - by whom? Where? What are the criteria for selecting these queries?
* The evaluation refers to the same queries running on different datasets but the examples for the aviation ontology data shows terms specific to the aviation ontology. Where these specific queries come from? Why these and not others?
* The conclusion confirms my concerns on the motivation of the work. "In this paper, we proposed a foundational benchmark that optimises SPARQL queries on foundational based domain ontologies". However, the article doesn't deliver on that as we don't have an evaluation that compares these automatically generated queries with reasonable alternatives - for example, a SPARQL developer designing queries over the aviation dataset (on 10M triples?)

Review #3
Anonymous submitted on 31/May/2020
Suggestion:
Major Revision
Review Comment:

Summary:This paper proposed a benchmark that optimizes SPARQL queries on foundational based domain ontologies. The authors employed the benchmark for evaluating the performance of different triple stores. A foundational indexing technique was also designed to achieve faster results.

While this paper is quite easy to follow, it is not innovative enough and has some flaws in experiments to be accepted this time. I'll elaborate on these problems as follows.

Major concerns:

1). The motivation is not clear:

a) Page 6, “relation to our approach” part: Why not design unified foundation Ontology (UFO) for the existing LUBM and UOBM.

b) There is no doubt that using the index in the database can speed up the efficiency of query and access. Why design a special UFO index? Please give more explanations.

c) As for the inconsistency problem of queries you mentioned, I did not find any corresponding solutions in the paper.

2). Experiments are not well-designed and reported:

a) The data source is unknown, just know that it is generated from existing real-world data. But I cannot find any other information about the data, such as the scale of RDF triples and the expressivity of upper ontology.

b) The 16 patterns in Table 3 do not seem representative. In terms of query types, these patterns do not cover all query types, and they are all simple questions. There are no patterns and comparison experiments on complex questions.

c) The comparison of experiments is not fair. As I said before, why not directly use LUBM and UOBM to design UFOs for comparison?

3). Writing issues: I think the authors could do with more polish:

a) The authors mentioned OWL DL, however, as throughout the text I found the data in this paper was only related to the storage format of RDF. The author did not introduce the scale and expressivity of ontology in the experiment.

b) Both the access links of Aviation Safety Ontology and UFO-based Data Generator have been invalidated, resulting in a problem for verifying the method.

c) At the end of Introduction, the author said “This benchmark can be reused not only for our foundational generated data but also for all data sets compliant with the unified foundational ontology”, however, I only found that the event-related ontology design problem is solved, not all UFO compatible data sets

d) Why describing all the ontologies by UFO diagrams instead of direct axioms? Moreover, the ontology cases given in the paper are too simple. Personally, I do not think that UFO language can equivalently describe the ontology under OWL DL.

4). Important references are missing:

a) The original sparql1.1 references are not cited, see [1] below.

b) OBDA and SPARQL optimization work are not cited, such as [2-4]

d) It is suggested that related work should be introduced after the Introduction, so that readers can understand the core motivation.

Minor issues:

1. A lot of axioms have format problems, such as (\ exists P.C)
2. The Figures are not clear, the resolution is too low, such as Fig. 3, Fig 10.
3. There are many typos, such as: wil-> will, representation-> representations, table 6.1-> Table 6.1
4. The format of citation is not appropriate, such as [43], [44], [45]-> [43-45].
5. The numbers of Q1-Q3 appear repeatedly in the experiment, it is recommended to distinguish them.
6. The vertical axis of Fig. 4- Fig. 9 and Fig. 11- Fig. 16 lacks a specific time unit.