OptiqueVQS: a Visual Query System over Ontologies for Industry

Tracking #: 1461-2673

Ahmet Soylu
Evgeny Kharlamov
Dimitry Zheleznyakov
Ernesto Jimenez-Ruiz
Martin Giese
Martin G. Skjaeveland
Dag Hovland
Rudolf Schlatte
Sebastian Brandt
Hallstein Lie
Ian Horrocks

Responsible editor: 
Freddy Lecue

Submission type: 
Full Paper
An important application of semantic technologies in industry has been the formalisation of information models using OWL~2 ontologies and the use of RDF for storing and exchanging application data. Moreover, legacy data can be virtualised as RDF using ontologies following the Ontology-Based Data Access (OBDA). In all these applications, it is important to provide domain experts with query formulation tools for expressing their information needs over ontologies. In this work, we present such a tool, OptiqueVQS, that has been designed based on our experience with OBDA applications in Statoil and Siemens and on the best HCI practices for interdisciplinary engineering environments. OptiqueVQS implements a number of unique techniques that distinguish it from analogous query formulation systems. In particular, it exploits ontology projection techniques to enable graph-based navigation over an ontology during query construction time. Secondly, while OptiqueVQS is primarily ontology driven, it exploits sampled data to enhance selection of data values for some data attributes. Finally, OptiqueVQS is built on well grounded requirements, design rationale, and quality attributes. We have evaluated OptiqueVQS with both domain experts and casual users and qualitatively compared our system against prominent visual systems for ontology-driven query formulation and exploration of semantic data. OptiqueVQS is available online and can be downloaded together with an example OBDA scenario.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Vanessa Lopez submitted on 26/Sep/2016
Minor Revision
Review Comment:

This is an in-use paper describing OptiqueVQS, a system to query ontologies through visual interactions with the user.

The aim of these kind of systems is to address complex user information needs for users that are not able to query the system using a formal query language, such as SPARQL. Their main challenge, as expressed by the authors, is to balance between usability and expressivity.

There is extensive state on the art on query builders, NL interfaces and guided (autocompletion) as well as graphical interfaces to query ontologies. The system presented here has also been presented and evaluated with casual users in a previous publication. Thus, the contributions of this paper according to the authors are:

- The system has been developed upon industrial requirements
- The system is evaluated with industrial (domain expert) users from Statoil and Siemens (the two main use cases)
- It has been extended to include support for spatial and temporal query formulation.

The use cases descriptions are a good example of semantics in-use for industry, as well as their potential value to integrate and query the data, e.g., obscure and complex schemas for which building interesting SQL queries is complex and time consuming, requiring a very large number of table joins.

To gather the requirements for expressivity, the authors collected a catalogue of queries provided by Statoil in NL. These queries were analysed with respect to 6 query types. The system therefore was designed to support the formulation of the majority of the queries: conjunctive three-shaped queries that do not contain negations, cycles or aggregations. The authors also make a case for using a multi-paradigm user interface. The widget based front end is nicely described with some examples.

The section describing the quality attributes, based on previously conducted surveys, is also nicely written. The evaluation is convincing and well designed, even if it is a small scale evaluation (10 users on 19 test cases), I am happy to see an evaluation performed with domain experts in real settings. However, the main drawback is that not all the attributes described before are evaluated. In particular regarding scalability and adaptivity.

As the authors point out, one of the main problems VQS systems have to face is scalability, as the number of concepts and properties to choose from increases (as well as the ambiguity), overloading the user interface. Adaptativity and ranking based on user logs (as the authors suggest) can mitigate the scalability issue, however the ranking has not been evaluated. Ranking may not be needed in these scenarios because the ontologies used for the evaluation (both automatically and manually created) are relatively small. Perhaps, for this scenario ranking is only needed in queries that ask for specific instance names (like the query used for Figure 6 and 7), instead of schema classes and properties?.

I do not ask the authors to provide an evaluation on scalability for this paper, but considering that the main contribution is the evaluation, at least an extended description of their ranking algorithm (or a reference) and a discussion on the open issues and challenges (scalability, the need for a well define schema, etc.) should be added. For example, the authors discuss the negative impact on the results of using an automatically created ontology vs. a manual one, with obviously less quality. However, in order to make the evaluation (and thus contribution) stronger, I miss also a discussion on scalability and other current limitations of the approach based on the evaluations (i.e., why the correct compilation rate is 88% for the manually created ontology? in which queries the users failed and why?) .

An online demo of the system is provided , as well as examples for testing, but it requires an username/password, so I didn’t check it.

Minor typos and questions:

- page 3. form the project’s website -> from the project’s website

- Where are the use cases in Table 2 (E4, E8, etc.. ) described?

- OptiqueVQS generated temporal queries in STARQL, but how are spatial constrains executed ? how are spatial queries (map widget) translated into SPARQL?

Review #2
Anonymous submitted on 15/Nov/2016
Major Revision
Review Comment:

The paper presents the visual query formulation tool, OptiqueVQS that is based on navigation graphs. The toolset is the outcome of the big project Optique. The described system backend with a lot of relevant publications looks really impressive. In general, there are several pieces but they mostly focus on describing a system whilst the research achievements in terms of originality and significance of the results are not clearly highlighted in the paper.

I have a hard time to judge the differences/improvements of this paper from previous publications which the authors admit that this paper is an extension of them. As a manuscript submitted as 'full paper’, and the paper needs to have signifiant research and technical contributions which are beneficial for the readers. But when I picked two previous papers of the authors, [72] and [77], I found several sections/paragraphs are very similar from these papers that confuses me on new contributions of this paper. Here are the list

- Table 4 added 5 more items to 14 of Table 1 of [77]
- Table [3] is similar to table 2 of[77]
- Figure 5 is Figure of[77]
- Figure is Figure 2 of [72] or similar to Figure 1 of[77]
-Figure[9] is an extension of Figure[4] of[77]
-9 Quality Attributes of Section 4.2 is a replica of Section 2.2 in[77]
-Section 5.2.1 and 5.2.2 are the same with Section 2.3 of[77]

In some case, It’s acceptable to have a journal article which improves/extends a previous conference paper with a signification more content (e.g, >30% new significant findings/results). I fear that even it looks much longer to previous papers, it’s only a simple combination with new wordings. Therefore, the paper needs a considerable restructuring to show clear distinctions/improvements with previous works.

With respect to technical depth, I think the paper needs to go deeper and more concrete into certain research issues and how to solve them. For instance, using ontology-driven navigation graph to formulae visual query is quite interesting, but, I doubt that it will work on a large data set as a visualisation might pose very complicated query. Therefore, the performance aspect needs to be throughly discussed or probably some evaluation figures should be presented in the paper.

Dig a bit deeper into performance aspect, I would argue the scalability of using STARQL as a query like the one in Figure 7 will has to be rewritten to a set of complicated queries to the underlying system. It’s also interesting to know how the system maintaining the freshness of the result as the system considers stream data an important data source.

There are some minor typos and grammar errors in following sentences:

Building interesting SQL queries require therefore in many cases a very large number of table joins (i.e., 20 to 30 tables) (E2) , thus making the task of handcrafting SQL queries towards
this database very complex and time-consuming. ->?

The analysis suggest that majority -> suggests?

The query asks for the all the welbores… ->..for all the weldors...

[72] A. Soylu and M. Giese. Qualifying Ontology-based Visual Query Formulation. In Proceedings of the 11th International Conference Flexible Query Answering Systems (FQAS 2015), volume 400 of Advances in Intelligent Systems and Computing, pages 243–255. Springer, 2015.
[77] A. Soylu, E. Kharlamov, D. Zheleznyakov, E. Jimenez-Ruiz, M. Giese, and I. Horrocks. Ontology-based Visual Query Formulation: An Industry Experience. In Proceedings of the 11th International Symposium on Visual Computing (ISVC 2015), volume 9474 of LNCS, pages 842–854. Springer, 2015.

Review #3
Anonymous submitted on 09/Jan/2017
Minor Revision
Review Comment:

The paper presents the system OptiqueVQS that has been developed in the context of the EU project Optique together with stakeholders from industry. The system is based on a number of requirements that have been derived from use cases of the industry partners and translated into system features. The authors report on important considerations and design decisions as well as formalize the underlying semantics and querying expressivity. Finally, they summarize a number of user studies conducted to evaluate the usability of the system.

The paper is within the scope of the journal. It is a very informative and insightful read with a number of interesting results, including several smaller results like categorizations (e.g., types of query systems), tables (e.g., considered query types), lists (e.g., quality attributes and features), etc. My main concern is with the originality and novelty of the work, as the authors already published a couple of papers on Optique and OptiqueVQS in different conference proceedings and journals (as also indicated by the various self-references in the article). For instance, one closely related article on OptiqueVQS appeared in the journal Universal Access in the Information Society last year (see http://link.springer.com/article/10.1007/s10209-015-0404-5). While there seem to be different foci and incremental improvements, this somehow limits the originality of the work and raises the question if interested readers are able to distinguish between the different OptiqueVQS papers and select, read, and cite the right one in the end. Having raised this concern, I see sufficient novel contributions and insights in this submission that warrant another publication on OptiqueVQS and therefore recommend to accept the paper. Nevertheless, the authors should consider if there is any better way to clearly highlight the different contributions of the individual papers and guide future reviewers and readers in immediately spotting the differences and selecting the right paper, i.e., the one best meeting their information need. The current list of novelties given at the end of the introduction is a good start but not sufficient in this regard.

Furthermore, the authors should carefully read through the paper before publication. Although the paper is generally well written, some sentences need attention and could be improved. There are also some basic language flaws that should be corrected before the paper is published. Apart from inconsistent punctuation (i.e., commas, but this is a minor issue), several nouns are without articles ("a", "the") where readers would expect one, for instance, in "a query as [a] whole", "rest of [the] article", "is [a] finite set". Also, singular nouns are sometimes used where there should be plural forms and vice versa (e.g. "represented as a knowledge base[s]", "when users interacts", etc.). There are also a few typos (e.g.,"form [->from] the project’s website", "platfrom", "three-shaped", etc.), missing words (e.g., "that [is] well-suited", "used [to] expand", "usability [of] OptiqueVQS"), mistakenly inserted words (e.g. "[and] Rhizomer [16]", "for the all the") and commas (e.g., "standardised, semantics") as well as other minor language flaws (e.g., "a VQS is a data retrieval (DR) paradigm" or "A VQS have a better potential", etc.). Since at least some of the authors seem to be native English speakers, I assume that just a careful reading would be needed to fix these minor language issues.

When revising the paper, the authors should also rethink some of the quite bold statements. For instance, I would not agree that Rhizomer and Konduit VQB "demand no technical background". Although I understand what the authors want to say and that this statement is detailed in Section 8, it is too bold at that place. At least for Konduit VQB, some technical background is needed in my humble opinion (if not, I would expect some kind of proof or at least a reference to a piece of research supporting this statement). Similarly, the following statements are quite bold: "A VQL is as difficult as a formal textual query language for a domain expert as it demands considerable technical skills and knowledge to interpret the visual semantics and syntax and understand the relevant technical jargon." Without any (empirical) evidence or convincing argument or reference, these statements are too general and bold and should better be formulated more moderately (at least in this kind of research paper). I would also disagree to the following sentence: "Browsing is a good approach when the data set and result set are not very large and users need to pay attention to each individual item in the result set." There are numerous examples of faceted browsing that prove the opposite (consider e-commerce websites like Amazon or hotel search websites like Booking.com). Again, I see what the authors want to say but they should think of a better way how they could express it.

The design choices of the authors are mostly convincing and backed with good arguments. However, no reason is given for the following limitation where I would have expected one: "In this work we focus on construction of SPARQL queries where basic graph patterns do not have variables on the second position, nor on the third position, when e is rdf : type. That is, we do not allow predicates as variables, and thus our queries can naturally be represented as conjunctions of unary and binary atoms." As a minor remark, a partly repetitive sentence is used subsequently ("In our work we focus on construction of..."). This paragraph needs a revision.

I like the categorization of systems given in the introduction, i.e., the distinction of VQS, VQL, etc. Later on, another category of "visual query formulation systems" is introduced and it is stated that OptiqueVQS belongs to that category. However, it is unclear how this category relates to the initial categorization, i.e., if it is yet another category, a subcategory or just a different name for one of the above categories. Furthermore, the terms VQS and VQL are introduced a second time in Section 4, which is redundant and not necessary - especially, since this categorization is repeated yet another time in Section 8, where it is more adequate.

The following aspects should get more attention in the revised version of the article:
- Generalization: How much can the results from the presented use cases be generalized? Are the results also valid in other contexts? If yes, to what extent?
- Limitations: What are the limitations of the approach? For instance, what are the limitations that result from restricting the queries exclusively to tree-shaped graph patterns?
- Backend implementation: I would have expected more details here. What technologies are used in the backend? How has the synchronization of the different views and underlying models been realized?
This aspects deserve at least a brief description and/or discussion. In turn, other parts of the article might need to be shortened (which should not be a problem).

## Minor issues
- The term "information model" would benefit from a definition, since it might otherwise be differently interpreted.
- The word "use[-]case" is inconsistently written (with and without hyphen).
- The brackets in the example on page 3 should rather be "every wellbore has (at least) one core". It is also not the best example here, as readers might not see why a subclass construct has to be used in the OWL axiom (this is not obvious as restrictions are introduced later).
- The following is a bit too short: "(i.e., context-aware)" (p.12)
- The following number misses a unit: "The second query (Exp3) only took 63 on average..."
- The authors might consider rephrasing the following sentences:
- "The former is addressed as a part of quality attributes in Section 6, while in this section we address the local design choices concerning the implementation of individual widgets." This might be confusing due to the "quality attributes vs. quality features" distinction and reference to Section 6.
- "The tasks were all conjunctive and shown in Table 4."