Review Comment:
This system report describes ExConQuer (Explore, Convert, and Query Framework), a web-based platform that aims at enabling consumption of Linked Data (LD) for non-expert users. The system is a semantic data browser that can export SPARQL query results to a variety of popular simple data formats, including tabular ones such as CSV.
I share their opinion that the Linked Data community has not produced highly usable tools for mainstream data consumption, beyond the circle of semantic web practitioners. The tabular view dominates the database, data analytics, and spreadsheet worlds, being by far the most popular way to represent and share data. As the RDF model is graph-based and not table-based, it is very reasonable to facilitate a (reversible) graph-to-table translation, to enable users to extract data from endpoints and download it in a format that can be loaded in the vast majority of computing tools. In the current situation, Linked Data can only be accessed through full data dumps (usually too large for most user needs), through SPARQL (which indeed requires knowledge of its syntax and semantics), through direct manipulation in a programming language like Java or Python, or through semantic web browsers, which tend to support very shallow data exploration but not querying and actual data consumption. None of these options is as simple as going to a data producer website (e.g., the World Bank), downloading a CSV file, and loading into Excel (or similar tools) to do some real work on it.
* Clarity, illustration, and readability
The paper is very clear and well-written, the tool is available online. I don't have major concerns about the presentation, but rather with some of the content, as described below.
* Quality, importance, and impact of the described tool or system
While I strongly support the authors' view and attention to the issue of LD usability, and I find the approach original, I have some concerns about the quality of the tool that should be addressed to maximize its impact:
- Usability: Based on my usage of the tool, I wish it had a clearer structure and terminology, avoiding the jargon typical of these Semantic Web tools.
When opening the tool for the first time, the user sees a toolbar with the labels "ExConQuer, PAM Tool, Query Builder, Conquer Ontology", without having any guidance on what these things are supposed to mean. I find the name "PAM Tool" particularly obscure, and I feel that logically that should be after the Query Builder, which is where the core functionality of the system is (and the landing place for a new user). I recommend that the authors reduce the jargon using simpler terms, for example replacing "PAM Tool" with "Saved Queries" or something along these lines. Moreover, I think the interface could do with tooltips and contextual help which at the moment is entirely missing.
- Class selection and relevance: An issue in the interface is the list of classes. If I type "City" (one of the suggested strings) on the DBpedia endpoint, I get a list of roughly 400 strings, with repeated values, and without any apparent sorting mechanism. Once I select a particular class, the tool selects a set of instances as an example. The problem is that the selection seems to be random or alphabetical, rather than based on relevance. For example, as a user, I expect "London" or "New York" or "Rio de Janeiro" to be clear examples of a "city", while the current examples are really obscure. The same applies when selecting "Actor", where pornographic actors are (randomly) returned very high in the results (nothing against the genre per se, but it looks odd). I think that even some simple relevance mechanism here would greatly enhance the tool, for example suggesting highly-connected/popular entities before less connected/rare ones. I would not be picky on this point, if the tool did not have the explicit purpose of being highly usable and reassuring for novice users.
- Query complexity: At the moment the tool seems to support only queries on one class, selecting its attributes and properties. However, in many cases, users need results that include multiple related entities (e.g., select all mayors of Spanish cities having more > 10000 people). How could the tool be expanded to traverse properties? Without this feature, I feel that its usefulness would be quite limited, and this should be acknowledged more explicitly in the tool's limitations.
- "RDF softening": The core idea behind the tool is that the RDF model is flexible and powerful, but it's hard to grasp for non-experts, and it is necessary to allow for transformation to tabular views. The authors call this process "RDF softening", defining it as "The generation of domain-specific RDF data views in semantically-shallow representation formalisms." I am not sure that "softening" is a good metaphor for this process, as I see no obvious soft/hard axis. To my understanding, what the system does is a triples-to-table transformation, losing the formal semantics but enabling easy export to non-RDF-based tools. The outcome is simpler, "lighter" data. The name could be something like "simplification", "tabularization", or "tablification". A metaphor of data being made "lighter" might work better.
- User cognition: To make a system more usable, an obvious way consists of hiding unnecessary details and signaling affordances to the user. However, even a very usable system needs to rely on an organization that includes entities, operations, affordances, and enable their manipulation/execution. A tool like Excel, although simpler than many other computing platforms, requires the user to understand sheets, columns, rows, formulae, etc. Without that knowledge, the user wouldn't be able to do anything useful with it. So the question is: Can we make an LD tool usable without introducing to the user the basic principles of this paradigm (e.g., entities, URIs, triples, end points, etc.)? I don't think the current organization of ExConQuer is particularly clear in this respect, and I doubt that a novice user would be able to perform a realistic task (e.g., download the names and birth dates of all French actors from DBPedia), without first understanding the concepts of "endpoint", "class", "property" and "attribute". I feel that the tool should do a better job at explaining how these elements are related through the organization of the interface. I expect an experienced LD user, however, to find the tool reasonably clear. I strongly suggest that the authors reorganize the interface, trying to clarify the purpose of each element, and making its general organization more intelligible.
- Evaluation: The part of the article I am most concerned about is the evaluation. The authors showed the tool to 27 people, made them use it in an exploratory way, and then asked them whether they found the tool useful. This set-up is quite contrived, and N=27 is a pretty low number. As a result, I doubt that it indicates the usability of the tool in any reliable way. It would have been much more informative to assign users more realistic tasks (e.g., finding actual data through the interface), and quantify their performance with respect to execution time, stress, etc, comparing it to a baseline (e.g., doing the same task with existing tools or directly in RDF/SPARQL). To be fair, I understand that this is a system report and not a full research paper, so I do not expect a very detailed study. However, a more convincing evaluation would greatly improve the paper.
Minor points:
- Abstract: The abstract should reflect more the content of the article, and not only the vision of the project.
- License: As the tool appears to be open source, the authors should emphasize that in the paper, highlighting that all the code is available on Github.
- The demo would be a great teaching tool, to introduce LD to new users. This could be made explicit in the paper.
- At first glance, the name ExConQuer made me think of 'ex-con', which means 'ex-convict'. Perhaps a slightly different name would not have this odd association.
- The term "shy away" is used too many times. This repetition should be removed.
- Section 2.3, "native systems": example here would help, e.g., Excel, R, or Tableau.
- Figure 2: This figure could be clearer, for example by making clearer the different roles of the query builder tool and the PAM tool. The figure could also include the export action, with a CSV output being returned to the user and input into another tool.
- The tool seems to assume that the user will want to "publish" their queries online. I can imagine that in many cases the queries must be private.
- Conquer ontology (p. 7): "assign a reputation": how does this work? It doesn't seem to be captured by the ontology.
"some sort of responsibility": vague, rephrase.
- p. 10: "led usability": clarify
|