ExConQuer: Lowering barriers to RDF and Linked Data re-use

Tracking #: 1399-2611

Authors: 
Judie Attard
Fabrizio Orlandi
Sören Auer

Responsible editor: 
Werner Kuhn

Submission type: 
Tool/System Report
Abstract: 
A major obstacle to the wider use of semantic technology is the perceived complexity of RDF data by stakeholders who are not familiar with the Linked Data paradigm, or are otherwise unaware of a dataset’s underlying schema. In order to help overcome this barrier, we propose the ExConQuer Framework (Explore, Convert, and Query Framework) as a set of tools that preserve the semantic richness of the data model while catering for simplified and workable views of the data. Through the available tools users are able to explore and query linked open datasets without requiring any knowledge of SPARQL or the datasets’ underlying schema. Moreover, executed queries are persisted so that they can be easily explored and re-used, and and even edited. With this framework we hence attempt to target the evident niche in existing tools that are intended to be used by non-experts to consume Linked Data to its full potential.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Andrea Ballatore submitted on 07/Jul/2016
Suggestion:
Minor Revision
Review Comment:

* Clarity, illustration, and readability

The authors have improved the tool and the article, and I have no objection to publication after minor changes. The current version of the tool is definitely useful to explore datasets and illustrate SPARQL to novice users.

I'm still not 100% convinced by the "softening" term, but I understand the authors viewpoint. The additional evaluation is appropriate, and makes the report more convincing.

An article that should be included in the related work:

Bowers, S., Madin, J. S., & Schildhauer, M. P. (2010). Owlifier: Creating OWL-DL ontologies from simple spreadsheet-based knowledge descriptions. Ecological Informatics, 5(1), 19–25. http://doi.org/10.1016/j.ecoinf.2009.08.010

* Quality, importance, and impact of the described tool or system

The latest version of the tool is definitely more usable, and shows a better conceptual organization and better examples of the classes in the knowledge base. The selection of multiple classes makes the tool much more useful, and the video tutorial is a good addition.

The authors should add a more explicit discussion of the limitations of the tool and possible improvements.

Minor points:

* 'barely- interpretable formats such as CSV': "barely-interpretable" is unclear. I would say that CSV files need meta-data to be interpretable.

* (GPX); --> comma?

* 'Ms. Excel' -> Please add examples beyond excel, .e.g "MS Excel, Open Office, R, or Tableau"

* Figure 1, shows --> Figure 1 shows

Review #2
By Tomi Kauppinen submitted on 13/Jul/2016
Suggestion:
Accept
Review Comment:

The authors have now properly addressed the comments by reviewers, including mine. There are some issues that could be further discussed (like the evaluation and related works) but for me this paper looks now mature enough for publication.

Only one minor suggestion for the camera-ready: please format the query at the end of page 6/begin of page 7 to fit in one page if possible.

Review #3
By Martin Tomko submitted on 17/Jul/2016
Suggestion:
Minor Revision
Review Comment:

This paper describes the implementation and evaluation of a GUI tool simplifying the interaction with linked and RDF data for users with low previous knowledge of linkedData technologies, terminology and technical intricacies.
This is a revised submission of the paper, and I am reviewing it as such. I appreciate the effort of the authors in particular to reflect on the high quality comments from Reviewer 1. I believe that with minor changes, the paper can be accepted for publication in the category of Tools and Services.
(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).
The paper (and tool) shows evidence of being of interest (although maybe not yet useful - but that may be a reflection on the LD technology, not the tool iteslf) to a range of stakeholders and may lead to interesting discussions and discovery of LD technologies by a range of applications specialists.

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

I think that the paper is well written and reasonably clear. There is, however, a small number of issues I have still spotted, and if addressed I believe they could assist with the clarity of the paper/tool:

Major comment:
- I am unsure how the two types of filters shown in the demo differ. One is n object property, another a data type property. But for instance, for concept "Place", municipality code is a string data type property - but it is a nominal value that must come fro ma list. This should be offered/discovered by such a tool (and really behave as object property. If the tool is to add value for people that explore unknown schemas, indicating the value range is important ( at least through some sort of comments or metadata, if real instances cannot be shown). I would expect that a mechanism reminiscent of say OpenRefine (openrefine.org) or other data wrangling tools could be of assistance here
- The nomenclature in the tool changed, but not in the paper: p2: PAM is still mentioned, but does not appear in the interface ( and the video is not updated, but that is a minor comment). I think that all mentions to PAM could be deleted ( I do not think that this is real Provenance access), it is merely a stored queries browser, or query history browser. One question that could be addressed with respect to this is whether the query browser only stores the queries ( and thus the results are always recalculated, and may differ - if source data changed - from original results) or whether there is an option to store historical results as well. Both are important functionalities, but the control should be up to the user. Why? - Because one may publish an analysis based on such an interface, and the readers may want to scrutinise the results. If they achieve different results, the provenance of the original results should be assured, and the information retained. This is a common concern addressed by eResearch tools developers.
- The way to compose compose more complex queries with logical operators seems to be through clicking on the filter icon. This was not entirely intuitive. I am unsure how to write more complex queries, other than with the AND operator: OR, NOT) etc? I suspect that some of the users will be familiar with approaches to analyse data using SQL, but may need handholding with respect to translating familiar query composition to LD. How can I specify queries such as cities with a population of > 10000 that are twin cities of a birth city of soccer player XY, or that have a river running through with length <100000m (convert to kms - as an added advantage of LD)? I realise that some of these may be beyond the capabilities, but a discussion of non trivial queries would be appreciated - An example of an equivalent of a SQL join in particular, to make new users of LD appreciate the capabilities.

Additional minor editorial comments:
- Abstract: "and and even edited" - remove excessive "and"; delete "hence"
- Introduction, para 1: it is better style to introduce the reference so as to not interrupt the flow of the reader. So, I would suggest to move [3] to the end of this sentence.
- Introduction, lod-cloud footnote: this is a statistic/research from April 2014. Is there any more up to date information you can cite? This is obsolete in the perspective of the advances in the field (at least, one would hope so).
- Introduction: I have an issue with the sentence "Unfortunately, the emergence of a wide number of tools supporting people to publish their data as Linked (Open) Data, has not been complemented by approaches supporting them to consume existing Linked Data in formats
other than RDF [3]". Isn't RDF THE format for LD? I think that the issue addressed here is the problem of presentation and interfacing and terminology, rather than what is the encodiing [SIC] behind the scenes. Similarly, as with e.g., Excel, the xslx format encodes the data as XML, but there is a familiar metaphor of a spreadsheet in which the data are presented. This sentence may need editing to reflect this ( as long as I understand what the authors mean properly).