S-Paths: Set-Based Visual Exploration of Linked Data Driven by Semantic Paths

Tracking #: 2195-3408

Marie Destandau
Caroline Appert
Emmanuel Pietriga

Responsible editor: 
Claudia d'Amato

Submission type: 
Full Paper
Meaningful information about an RDF resource can be obtained not only by looking at its properties, but by putting it in the broader context of similar resources. Classic navigation paradigms on the Web of Data that employ a follow-your-nose strategy fail to provide such context, and put strong emphasis on first-level properties, forcing users to drill down in the graph one step at a time. We investigate a navigation strategy based on semantic paths and aggregation. Starting from sets of resources, we follow chains of triples (semantic paths) until we find properties that 1) provide meaningful descriptions of resources in those sets, and 2) are amenable to visual representation, considering a broad range of visualization techniques. We implement this approach in \spaths{}, a browsing tool for linked datasets that systematically tries to identify the most relevant view on a given resource set, leaving users free to switch to another resource set, or to get a different perspective on the same set by selecting other semantic paths to visualize.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Roberto García submitted on 08/Jul/2019
Major Revision
Review Comment:

The paper contributes a very interesting, and novel in many regards, approach to semantic datasets exploration. The main novelty is in considering property paths when exploring sets of resources, not just direct properties, and on a set of predefined visualizations that are automatically configured and selected based on their computed usefulness.

Despite the interesting contribution and the nice results visible in the online version of the tool, my recommendation is a "major revision" mainly because the paper stays too much at the overview level and provides little details about the contribution internals.

Regarding this issue, there is little information about how the tool interacts with the SPARQL endpoint. My understanding is that it is done just through SPARQL though the code seems to be linked to just the Virtuoso store. In any case, this makes it difficult for me to understand what SPARQL is generated for some queries.

For instance, trying to get the names of all female laureates between 1900 and 1919. I first focus on the first two decades displayed. Then, I change the category dimension to the genre of the laureate and focus just on females. If I then display the names of the laureates, I'm getting also some men. Though not evident from the feedback provided by the tool, this makes sense because I'm all the time filtering awards not laureates and there are awards with both male and female laureates. Thus, I should pivot to the set of laureates before filtering by genre. However, I'm just offered this option if I use the path to the genre through laureate and then pivot. I managed, at last, to get the desired result, though the experience was a little confusing because SPARQL DISTINCT might be operating at different levels and when displaying the laureates and their Nobel prize years, the second Nobel for Marie Curie disappears.

Other parts that require more detail are regarding what kind of processing is done to prepare the system when it is deployed on a new dataset, especially things like how entry-point classes are determined. In this regard, the proposed approach of directly jumping to the selected entry-point class seems too "narrow" and against Shneiderman's mantra about starting with an overview. What about using the treemap view to show the main classes in the dataset and let the user choose where to start from?

Finally, something that also requires more details, and maybe more effort from the part of the authors, is the part regarding evaluation. Right now, though it is well motivated and the questionnaire about what users learned is very appealing, a more rigorous approach is also required, especially regarding the effectiveness and efficiency of the tool. For instance, actually measuring the time it takes users to complete the tasks.

As mentioned, a more rigorous evaluation would require further work so maybe the best option for the current paper would be to extend the technical details about the contribution, as mentioned earlier, and keep the evaluation mainly as future work, with some preliminary results like those from the questionnaires. Then, for future work, it might be interesting to use evaluation results to compare the user experience provided by the tool to similar ones. One option might be the BESDUI benchmark (https://github.com/rhizomik/BESDUI), which in fact is a "cheap" approach to UI benchmarking because it doesn't require real-users involvement.

Review #2
By Agnieszka Lawrynowicz submitted on 04/Nov/2019
Major Revision
Review Comment:

The paper presents an approach and a tool for visual exploration of linked data, which provides visual representations of resource sets that help gain insights about those resources.
The authors gave a good overview of related works pointing that the other approaches do not make it possible to navigate linked data directly while benefitting from aggregation techniques, sub-selections etc.

+ important topic, fitting well in the scope of the Semantic Web Journal
+ good narrative
+ good idea
+ set-oriented exploration, such as identifying correlations, observing distributions, comparing & contrasting groups of resources
+ builds nicely on the experiences and results of previous works in the area
+ minimal effort from users before they can start browsing
+ S-Paths is distributed as an open-source project (requires registering/signing up to the git service of INRIA)

- some too much generic descriptions - lack of precise definitions, algorithms etc.
- some information is given only by means of examples and not by an exhaustive list
- the evaluation setup was not always clearly communicated to users

Further comments:


"amenable to visual representation" -> is this measurable?
"set-based navigation"-> it is not precisely defined in the paper what it is

***1. Introduction***
"Most linked data browsers employ a follow-yournose strategy" -> reference needed
"Properties that provide relevant descriptions of resources are not necessarily direct properties of those resources." -> any (quantitative) evidence? Examples?
"They can be several hops away in the RDF graph, depending on how abstract the dataset’s model is and on what ontologies it employs"-> can this be an artefact of serializing OWL to RDF?
Fig.1 contains very small images, hardly readable.

***3. S-Paths***
"S-Paths is designed to support users in the exploration of linked datasets"-> this sentence would benefit from the stating what S-Paths is (even if previously mentioned)

"semantic path" -> It is unclear whether a "semantic path" is being introduced by this paper, or if it has been introduced previously. In the former case, it would be better to provide a phrase like "in this paper we introduce semantic paths". In the latter case, there should be a citation or a section with preliminaries to delineate what is the contribution of the current paper with respect to previous work.

"A semantic path is a set of resources related to a set of values by a sequence of RDF statements." ->this definition is a bit vague, as it does not say in a detailed way about the relation, e.g. are RDF statements arbitrary? It would benefit from formalization

"S-Paths provides a collection of such templates"->where they can be found or a list of them? It would be useful to point to such list in this place of the manuscript or inform that it will be described later in the paper in Section X

"Considering paths that can be indirect, and not only first level properties, mechanically results in aggregation steps to the set of results."-> what does that mean that it *mechanically* results?

"The full analysis is performed only when S-Paths gets set up with a new set of graphs." ->what is a full analysis? The analysis with respect to all the characteristics?

"S-Paths provides a set of views: map, image gallery, timeline, statistical charts, simple node-link diagrams, etc." ->I recommend to remove "etc." and provide a full set of the views or a reference to the full set (e.g. included in a table).

"Once semantic paths for a given resource set have been characterized,"->it is unclear exactly how they are characterized, there haven't been any algorithm presented before this point of the paper, only a generic description

"These are used in multiple places in the interface, e.g., next to the resource selection menu, in the view configuration menu, in the axes’ legends, and whenever a semantic path is displayed."->again, I would prefer more exact description than only kind of an example (with use of "e.g.")

"They serve as entry points into the data, constituting what we consider a priori to be reasonably-coherent groups of entities."-> what is that the authors consider "reasonably-coherent groups of entities". This should be precised/formalized.

Fig.6 and Fig.7 are much too small in my opinion.

http://s-paths.net is down.

***4. Illustrative Scenario****

The images at Figure 8 are too small, non-readable.

***5. Evaluation***
In general, when users do exploratory search it is often a pre-requisite to solving some task.
The users were given tasks indicatively. At the same time the users were asked to explore the dataset in an open-ended fashion. There is perhaps some inconsistency or lack of explicit instructions on the goals with regard to the guidelines the users were given. Therefore I think that the quantitative results presented in "5.3. Task Success and Task Time" are not valid or reliable.

"They also had to tell whether they would have been able to answer those questions before the experiment."-> I am not sure this is the right setup.
I would expect such setup in which one group of the users is divided into two groups where one is answering the questions before and then the subgroups change the roles solving another, but similar configuration of the task, i.e. actually evaluating how well the task is solved with and without the proposed system and not asking how the users think they would solve the task. The baseline might be raw data?

***6. Limitations*

I appreciate that the authors provided the limitations section.

6.2. Data Processing: The authors claim they have tested S-Paths on several datasets, but it is not described what was the setup of the testing, and what was tested exactly.

Overall, I am rather positive about the paper, regarding its narrative, motivations, topic and contribution, which is very in line with the scope of the Semantic Web Journal.
However, I would expect more precise definitions (clear, even if not overly formal) and more rigorous style of writing (at places I have indicated in comments) and fixing a broken link to the demo.

Another major remark is that the paper might perhaphs be submitted to a category of "Reports on tools and systems"?