S-Paths: Set-Based Visual Exploration of Linked Data Driven by Semantic Paths

Tracking #: 2383-3597

Marie Destandau
Caroline Appert
Emmanuel Pietriga

Responsible editor: 
Claudia d'Amato

Submission type: 
Full Paper
Meaningful information about an RDF resource can be obtained not only by looking at its properties, but by putting it in the broader context of similar resources. Classic navigation paradigms on the Web of Data that employ a follow-your-nose strategy fail to provide such context, and put strong emphasis on first-level properties, forcing users to drill down in the graph one step at a time. We introduce the concept of semantic paths: starting from a set of resources, we follow and analyse chains of triples and characterize the sets of values at their end. We investigate a navigation strategy based on aggregation, relying on path characteristics to determine the most readable representation. We implement this approach in S-Paths, a browsing tool for linked datasets that systematically identifies the best rated view on a given resource set, leaving users free to switch to another resource set, or to get a different perspective on the same set by selecting other semantic paths to visualize.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Roberto García submitted on 27/Jan/2020
Review Comment:

All my comments have been addressed by the authors, especially regarding section 3 and the corresponding new figures. The only remaining concern, but it is basically regarding the tool and not the paper, is to make the circle packing diagram providing the dataset overview more prominent and interactive. The circles might be of different colors, at least the selected one, and provide a tooltip showing the corresponding class name when the user places the move over them. It might be also interesting that they can be clicked to switch among classes.

Review #2
Anonymous submitted on 11/May/2020
Minor Revision
Review Comment:

This manuscript has been evaluated based on the comments raised in the previous round of reviews.

Overall, the authors have addressed most of the major comments. In particular, the concerns about the SPARQL templates, the S-Path views, and the readability of the plots have been resolved by the authors. Nonetheless, below I describe some issues that are still open and I hope the authors can address in the next version of the manuscript.

# Major issue: Formal definition of semantic paths
The definition of semantic paths is presented using triple patterns. Yet, this formalisation is rather ambiguous as it is difficult to see what the assumptions are and what conditions are fulfilled by the entities or resources that belong to the same semantic path. For example, in the triple pattern (?entities p1/p2/.../pn ?values), is it correct to assume that p1/p2/.../pn is a constant? In other words, is p1/p2/.../pn given?

The triple pattern characterising the similarity of the entities in the semantic path is rather ambiguous. Please note that the triple pattern (?entities rdf:type ?typeURI) indicates that every resource must be associated with *some* class. However, the value of ?typeURI is not necessarily the same class for all the resources in ?entities. This would make the definition incorrect as, in my understanding, all the entities in the semantic path should belong to the same class.

An alternative formalisation that mitigates these issues is suggested below using set notation. In my opinion, since semantic paths are set of resources it is more natural to formalise this concept using sets. I would suggest the authors to carefully check this suggestion to ensure that it correctly captures the notion of semantic paths.

Given an RDF graph and a set of URIS p1,p2,...,pn, with n>=1. A semantic path is a set of resources P = { e | (e p1/p2/.../pn o) \in G for some o \in (U \cup L)} such that \forall e_1, e_2 \in P \exists C \in U, \e_1 rdf:type C and \e_2 rdf:type C.
Note: U and L are the set of URIs and literals, respectively, as used in the literature to define RDF terms.

# Minor comments
- "A semantic path is a set of resources related to a set of values by a sequence of RDF statements" -> RDF predicates. Please note that 'RDF statement' is used to refer to an 'RDF triple'.

- Fig 1., why is the query performed recursively? Maybe the authors meant that the query is split into other queries, whose results are then combined to answer the original query. Please note that this process is not necessarily recursive but rather a "Divide-and-Conquer" approach.

- Fig1., datatype(VALUES) should be datatype(?values). Please check.

- Page 6, line 32 (second column), 1.11-12 should be 11-12. Please check.

Review #3
By Agnieszka Lawrynowicz submitted on 16/May/2020
Minor Revision
Review Comment:

I thank the Authors for all the improvements they have done in their paper.

My main concern that still persists regards the evaluation.
Either the evaluation should be extended in a more rigorous way or, if an extended evaluation is not going to be included in the revised version of the paper, then, instead:
1) it might be useful to prepare, as a supplementary material, some tutorials (with sample scenarios and/or available functions) or other user documentation with illustrative examples and guidelines on how to use the tool.
Currently, when a lay user enters the [1] website, it might not be clear for her/him on how to start using the tool and how to proceed.
2) it might be also useful to extend the "discussion" section, e.g. by providing "lessons learnt" to inform other researchers on the key (research) findings and their discussion.

Further comments:

Page 3, section 3.1. Semantic Paths

There is a formula provided to formalize a semantic path (line 43).
The notation uses $p_1$, $p_2$, ... and ?entities, but this notation is not explained. What is $p_n$? Are these properties? Are these elements of a sequence of RDF statements? What is the meaning of "\"?
Why the variable denoting resources is named ?entities? Can resources be of any RDF type?

Furthermore, starting at line 45 there is a similarity criterion defined.
It is unclear what is ?typeURI.
Does this criterion is that the resources that bind to ?entities variable should be of type ?typeURI?

Page 4, Fig.1
What is a "theoretical query"? I understand the intuition, but maybe there could be a better name for that.
Is this possible to show how this query is rewritten to a set of queries, i.e. to include the queries into which the query is rewritten?

[1] http://s-paths.lri.fr