Ontop: Answering SPARQL Queries over Relational Databases

Tracking #: 1206-2418

Diego Calvanese
Benjamin Cogrel
Sarah Komla-Ebri
Roman Kontchakov
Davide Lanti
Martin Rezk
Mariano Rodriguez-Muro
Guohui XIao

Responsible editor: 
Oscar Corcho

Submission type: 
Tool/System Report
In this paper we present Ontop, an open-source Ontology Based Data Access (OBDA) system that allows for querying relational data sources through a conceptual representation of the domain of interest, provided in terms of an ontology, to which the data sources are mapped. Key features of Ontop are its solid theoretical foundations, a virtual approach to OBDA that avoids materializing triples and that is implemented through query rewriting techniques, extensive optimizations exploiting all elements of the OBDA architecture, its compliance to all relevant W3C recommendations (including SPARQL queries, R2RML mappings, and OWL 2 QL and RDFS ontologies), and its support for all major relational databases.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 22/Nov/2015
Minor Revision
Review Comment:

I believe that the authors have addressed in general the concerns of the reviewers and editor, and the quality of the paper has improved. In particular, as to my concerns on the expansion of some of the points of the previous version, I have the following comments:
- Federation in Ontop has been described in sections 3.3 and 3.4
- It is now clear that Ontop does not support Streaming, the footnote that was added clarifies further the point.
- As the manuscript was submitted as a Tools and Systems Report I agree that the T-mapping theoretical foundation can be left with the proper references to its formal description.
- As to the presentation of experimental results, I still believe that the paper readers would benefit on not only having the description of related SPARQL query answering systems (Section 6) but also including at least a summary on how does Ontop compare to these systems.

Review #2
By Jean Paul Calbimonte submitted on 30/Nov/2015
Minor Revision
Review Comment:

This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

This paper is a summary of the features of the Ontop system for querying relational databases using SPARQL queries.
The system described is based on well-known and well-studied query rewriting techniques, relying on mappings that bridge the relational and ontological models. Ontop is not a new system, it has been already presented in other papers, as well noted in section 7. Different pieces and previous versions have been described in several papers in different conferences in the past.

All main features described here (e.g. SPARQL query answering, Owl2QL support), or the Protégé plugin, have been described in those previous works. Therefore this paper is mostly a summary of these previous efforts. Although I see no major new contribution or novelty from the technical point of view (apart from the reworked examples and the migration to Github, and other minor details), this is an interesting and complete system overview that fits into the call for tools and system reports of SWJ.

Section 3 is an important addition wrt the latest submission. It provides a clear description of related tools and services that can make the whole OBDA approach viable. Mappings and ontologies are not easy to produce, and in industrial environments the technical staff are not used to these technologies. Therefore, the existence of appropriate tools for these tasks is important. Furthermore, the reported uses of Ontop on the industry and academia are also very important for this type of submissions. The authors mentioned additional uses of the tool (e.g. a fork on Stardog, and use in EPNet) which I think would have to be mentioned in the paper. It would be nice to know if Ontop is used, beyond the partners of the Optique project, and that is why any other use of the tool in other scenarios would be more than welcome. The authors have also explained why OBDA is important and useful in the industrial cases they have studied, although we have no evidence that the usage of Ontop was a success in those cases. Anyway it is not the goal of this paper to show that.

Concerning the evaluation, it is understandable that given space constraints it is infeasible to show full benchmark results here. However, it would be definitely useful to have high-level numbers that show that OBDA (with Ontop) is viable (or advantageous), for instance in terms of query answering response times (e.g. compared to other OBDA tools, or what is the overhead to RDBMS-only solutions). If space is the problem, Section 7 could be reduced to make space, as it provides only a historical account of the system. At the very least there could be a paragraph summarising such results, and providing pointers to where these comparisons/experiments/papers can be found.

The authors provide material in the form of a tutorial, including examples, at https://github.com/ontop/ontop-examples/tree/master/swj-2015. However this material needs to be reworked, cleaned and presented in a better way. In that URL we find scattered files, and a PDF presentation that has some hints on how to get the examples working. It would have been better to provide a simple one-page wiki tutorial that goes straight to the point in an easier way.

In the example 2.3, is it necessary to ask for ?p a :Patient ? if ?p has a neoplasm, it is already a patient right?. Also, when the answer is computed (e.g. 'Mary', the example could be more interesting, showing not only the name but also the uri of the patient. This would illustrate that in the output, there is also a small transformation process (e.g. from relational values to literals/IRIs/blanknodes). The query federation example could also use the same running example of patients, instead of the foaf query.

In summary, Ontop is a leading system on ODBA. This paper doesn't provide substantially new information about Ontop from the technical point of view. However, it provides a very complete overview of the system, which was previously unavailable. Moreover it gives a clear idea of what can be done with it and when it can be advantageous to use it. Furthermore we are given examples of its use in industrial use cases and its initial adoption by the software industry.

Review #3
By José Luis Ambite submitted on 11/Dec/2015
Review Comment:

The review addresses my concerns.

Some remaining minor issues:

- The attribute stage in the mappings below is unnecessary:

:db1/neoplasm/{pid} rdf:type :NSCLC .
SELECT pid, stage FROM tbl_patient
WHERE type = false

:db1/neoplasm/{pid} rdf:type :SCLC .
SELECT pid, stage FROM tbl_patient
WHERE type = true

They should be:

:db1/neoplasm/{pid} rdf:type :NSCLC .
SELECT pid FROM tbl_patient
WHERE type = false

:db1/neoplasm/{pid} rdf:type :SCLC .
SELECT pid FROM tbl_patient
WHERE type = true

- Some work is referred by URL only. There is space for citations. For example in page 6 there is a URL for Karma. It would be good to also cite some publication, for example:

Craig Knoblock, Pedro Szekely, Jose Luis Ambite, Aman Goel, Shubham Gupta, Kristina Lerman, Parag Mallick, Maria Muslea and Mohsen Taheriyan. Semi-Automatically Mapping Structured Sources into the Semantic Web.Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, Crete, Greece, 2012.

- There is much work on (automatic) schema mapping. It would be good to cite some of the seminal literature, like:

Ronald Fagin, Laura M. Haas, Mauricio Hernández, Renée J. Miller, Lucian Popa, and Yannis Velegrakis. 2009. Clio: Schema Mapping Creation and Data Exchange. In Conceptual Modeling: Foundations and Applications, Alexander T. Borgida, Vinay K. Chaudhri, Paolo Giorgini, and Eric S. Yu (Eds.). Lecture Notes In Computer Science, Vol. 5600. Springer-Verlag, Berlin, Heidelberg 198-236. DOI=http://dx.doi.org/10.1007/978-3-642-02463-4_12

AnHai Doan, Jayant Madhavan, Robin Dhamankar, Pedro Domingos, Alon Halevy. Learning to Match Ontologies on the Semantic Web. VLDB Journal 12, 303-319, 2003.

AnHai Doan, Pedro Domingos, Alon Halevy. Learning to Match the Schemas of Data Sources: A Multistrategy Approach. Machine Learning, 50, 279-301, 2003.