ExConQuer: Lowering barriers to RDF and Linked Data re-use

Tracking #: 1182-2394

Authors: 
Judie Attard
Fabrizio Orlandi
Sören Auer

Responsible editor: 
Werner Kuhn

Submission type: 
Tool/System Report
Abstract: 
A major obstacle to the wider use of semantic technology is the perceived complexity of RDF data by stakeholders who are not familiar with the Linked Data paradigm, or are otherwise unaware of a dataset’s underlying schema. In order to help overcome this barrier, we propose the ExConQuer Framework as a tool that preserves the semantic richness of the data model while catering for simplified and workable views of the data. With the aim of encouraging and enabling further re-use of Linked Data by people who would otherwise shy away from this task, this framework facilitates the publication and consumption of RDF in a variety of generic formats. In this manner, any stakeholder can export and work with RDF data in the formats they are most accustomed with, radically lowering the entry barrier to the use of semantic technologies, and possibly enabling the exploitation of Linked Data to its full potential. Through the ExConQuer Framework we provide a comprehensive set of tools that enable users to easily query linked datasets, download the results in a number of formats, and re-use previously-executed queries and transformations. With this framework we hence attempt to target the evident niche in existing tools that are intended to be used by non-experts to consume Linked Data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Andrea Ballatore submitted on 06/Nov/2015
Suggestion:
Major Revision
Review Comment:

This system report describes ExConQuer (Explore, Convert, and Query Framework), a web-based platform that aims at enabling consumption of Linked Data (LD) for non-expert users. The system is a semantic data browser that can export SPARQL query results to a variety of popular simple data formats, including tabular ones such as CSV.
I share their opinion that the Linked Data community has not produced highly usable tools for mainstream data consumption, beyond the circle of semantic web practitioners. The tabular view dominates the database, data analytics, and spreadsheet worlds, being by far the most popular way to represent and share data. As the RDF model is graph-based and not table-based, it is very reasonable to facilitate a (reversible) graph-to-table translation, to enable users to extract data from endpoints and download it in a format that can be loaded in the vast majority of computing tools. In the current situation, Linked Data can only be accessed through full data dumps (usually too large for most user needs), through SPARQL (which indeed requires knowledge of its syntax and semantics), through direct manipulation in a programming language like Java or Python, or through semantic web browsers, which tend to support very shallow data exploration but not querying and actual data consumption. None of these options is as simple as going to a data producer website (e.g., the World Bank), downloading a CSV file, and loading into Excel (or similar tools) to do some real work on it.

* Clarity, illustration, and readability

The paper is very clear and well-written, the tool is available online. I don't have major concerns about the presentation, but rather with some of the content, as described below.

* Quality, importance, and impact of the described tool or system

While I strongly support the authors' view and attention to the issue of LD usability, and I find the approach original, I have some concerns about the quality of the tool that should be addressed to maximize its impact:

- Usability: Based on my usage of the tool, I wish it had a clearer structure and terminology, avoiding the jargon typical of these Semantic Web tools.
When opening the tool for the first time, the user sees a toolbar with the labels "ExConQuer, PAM Tool, Query Builder, Conquer Ontology", without having any guidance on what these things are supposed to mean. I find the name "PAM Tool" particularly obscure, and I feel that logically that should be after the Query Builder, which is where the core functionality of the system is (and the landing place for a new user). I recommend that the authors reduce the jargon using simpler terms, for example replacing "PAM Tool" with "Saved Queries" or something along these lines. Moreover, I think the interface could do with tooltips and contextual help which at the moment is entirely missing.

- Class selection and relevance: An issue in the interface is the list of classes. If I type "City" (one of the suggested strings) on the DBpedia endpoint, I get a list of roughly 400 strings, with repeated values, and without any apparent sorting mechanism. Once I select a particular class, the tool selects a set of instances as an example. The problem is that the selection seems to be random or alphabetical, rather than based on relevance. For example, as a user, I expect "London" or "New York" or "Rio de Janeiro" to be clear examples of a "city", while the current examples are really obscure. The same applies when selecting "Actor", where pornographic actors are (randomly) returned very high in the results (nothing against the genre per se, but it looks odd). I think that even some simple relevance mechanism here would greatly enhance the tool, for example suggesting highly-connected/popular entities before less connected/rare ones. I would not be picky on this point, if the tool did not have the explicit purpose of being highly usable and reassuring for novice users.

- Query complexity: At the moment the tool seems to support only queries on one class, selecting its attributes and properties. However, in many cases, users need results that include multiple related entities (e.g., select all mayors of Spanish cities having more > 10000 people). How could the tool be expanded to traverse properties? Without this feature, I feel that its usefulness would be quite limited, and this should be acknowledged more explicitly in the tool's limitations.

- "RDF softening": The core idea behind the tool is that the RDF model is flexible and powerful, but it's hard to grasp for non-experts, and it is necessary to allow for transformation to tabular views. The authors call this process "RDF softening", defining it as "The generation of domain-specific RDF data views in semantically-shallow representation formalisms." I am not sure that "softening" is a good metaphor for this process, as I see no obvious soft/hard axis. To my understanding, what the system does is a triples-to-table transformation, losing the formal semantics but enabling easy export to non-RDF-based tools. The outcome is simpler, "lighter" data. The name could be something like "simplification", "tabularization", or "tablification". A metaphor of data being made "lighter" might work better.

- User cognition: To make a system more usable, an obvious way consists of hiding unnecessary details and signaling affordances to the user. However, even a very usable system needs to rely on an organization that includes entities, operations, affordances, and enable their manipulation/execution. A tool like Excel, although simpler than many other computing platforms, requires the user to understand sheets, columns, rows, formulae, etc. Without that knowledge, the user wouldn't be able to do anything useful with it. So the question is: Can we make an LD tool usable without introducing to the user the basic principles of this paradigm (e.g., entities, URIs, triples, end points, etc.)? I don't think the current organization of ExConQuer is particularly clear in this respect, and I doubt that a novice user would be able to perform a realistic task (e.g., download the names and birth dates of all French actors from DBPedia), without first understanding the concepts of "endpoint", "class", "property" and "attribute". I feel that the tool should do a better job at explaining how these elements are related through the organization of the interface. I expect an experienced LD user, however, to find the tool reasonably clear. I strongly suggest that the authors reorganize the interface, trying to clarify the purpose of each element, and making its general organization more intelligible.

- Evaluation: The part of the article I am most concerned about is the evaluation. The authors showed the tool to 27 people, made them use it in an exploratory way, and then asked them whether they found the tool useful. This set-up is quite contrived, and N=27 is a pretty low number. As a result, I doubt that it indicates the usability of the tool in any reliable way. It would have been much more informative to assign users more realistic tasks (e.g., finding actual data through the interface), and quantify their performance with respect to execution time, stress, etc, comparing it to a baseline (e.g., doing the same task with existing tools or directly in RDF/SPARQL). To be fair, I understand that this is a system report and not a full research paper, so I do not expect a very detailed study. However, a more convincing evaluation would greatly improve the paper.

Minor points:
- Abstract: The abstract should reflect more the content of the article, and not only the vision of the project.
- License: As the tool appears to be open source, the authors should emphasize that in the paper, highlighting that all the code is available on Github.
- The demo would be a great teaching tool, to introduce LD to new users. This could be made explicit in the paper.
- At first glance, the name ExConQuer made me think of 'ex-con', which means 'ex-convict'. Perhaps a slightly different name would not have this odd association.
- The term "shy away" is used too many times. This repetition should be removed.
- Section 2.3, "native systems": example here would help, e.g., Excel, R, or Tableau.
- Figure 2: This figure could be clearer, for example by making clearer the different roles of the query builder tool and the PAM tool. The figure could also include the export action, with a CSV output being returned to the user and input into another tool.
- The tool seems to assume that the user will want to "publish" their queries online. I can imagine that in many cases the queries must be private.
- Conquer ontology (p. 7): "assign a reputation": how does this work? It doesn't seem to be captured by the ontology.
"some sort of responsibility": vague, rephrase.
- p. 10: "led usability": clarify

Review #2
By Tomi Kauppinen submitted on 13/Nov/2015
Suggestion:
Minor Revision
Review Comment:

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).

This paper describes a set of tools for exploring and exporting Linked Data. From the paper a reader gets an idea of the tools but as often, one needs to see them in action to get the full picture. However, the PAM tool (found under purl.org/net/ExConQuer) produces the following error message:

"The JSON data file
http://52.26.72.112/fetchData.php
contains errors =
JSON Parse error: Unable to parse JSON string
We will explain the error in details after this message."
followed by the error "7: Curlerror: Failedtoconnecttolocalhostport8080: Connectionrefused" on the next page.

This thus makes is impossible to evaluate the usefulness of the tool in practice. I would like to check the PAM tool especially since according to the evaluation it received mixed feedback.

The query builder worked and seemed to be useful (although I am not that sure about the novelty given there are many competitors for it out there).

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

The paper is very clearly written and is free from spelling issues. Overall my impression is that the paper presents an interesting and potentially useful set of tools - but since the PAM tool was not accessible I cannot just from the paper get the full idea of its usefulness. The paper reports an evaluation but for me this seems to a rather shallow evaluation (I would have liked to see more actual use cases). Anyway, if PAM would work then it would be easy to test it with some real cases.

Given all this my recommendation is a minor revision including 1) putting the PAM tool back in action (such that reviewers can check it themselves), 2) carefully improving the discussion about user evaluation: now from the Table 1 there are two users disagreeing on question 3 but the text talks about one user(?)