Approaches to Visualising Linked Data: A Survey
Review 1 by Anthony Robinson
The authors have done a nice job responding with new details to my previous review comments. This is now a valuable contribution to the field and I'm happy to accept it as is.
Review 2 by Sarven Capadisli
I believe this version of the manuscript can be accepted as is. It has
corrected or responded to sections that I have mentioned in the first
version adequately.
The points that I have mentioned previously in particular to; suitability as introductory text, comprehensiveness, balance, readability, and importance to the Semantic Web community are still valid in this version. That is, it is a necessary contribution to the SW field.
I have noted only a few additional points that the authors might want to consider. These are non-vital issues, which may help the paper if the authors see fit for inclusion:
Section 1, paragraph 5:
"One of the central issues with large-scale LD production is the accuracy and completeness of links with other datasets."
The LD community commonly agrees on the fact that completeness is a 'nice to have', however, it is not always possible, hence, it is considered to be a non-vital issue.
Section 2 Summary, paragraph 1:
"We work from these two cases to ground our discussion with respect to
LD visualisation"
It is not clear why the two examples in this section are particularly used to build the discussion. It might help to briefly elaborate on that before moving on.
Section 2 Summary, paragraph 4:
"Linked data, encoded in RDF and most commonly returned as XML" and "The (default and commonly found) representation – RDF using XML serialisation –"
"Default"ness in particular needs citation to some stats, or be specific about whether it is referring to RDF serializations that are available from SPARQL endpoints, or published data. Otherwise, it could be argued that majority of the SPARQL endpoints nowadays offer several RDF serializations for their dataset, where RDF/XML is one serialization format among Turtle, N-Triples, and RDF/JSON.
Section 5.7.1. Data Verification & Validation:
Might be good to mention Sindice Inspector http://inspector.sindice.com/
The reviews below are form the initial review. The PDF file contains the new, resubmitted version which is currently under review.
Review 1 by Sarven Capadisli
1. Suitability as introductory text:
I appreciate this survey paper mainly for two reasons: it covers an area (i.e., navigation and visualization studies of linked data) that is not quite mainstream in the Semantic Web community yet, and gives a well coverage of the tools and services that's currently out there. I have found the paper to achieve what a survey paper should. It is generally descriptive (as opposed to critical) about the approaches that are out there and identifies the key areas for effective user interactions.
2. Presentation and coverage:
On comprehensiveness: the survey's findings are fairly up to date and covers most of the well known visualizations that are currently out there. If the authors are able to cover even more material, here are some more that they could consider:
* http://xml.mfd-consult.dk/foaf/explorer/
* http://foaf.qdos.com/
* http://nitelight.edefence.org/
* http://lists.w3.org/Archives/Public/public-lod/2010Nov/0470.html
On balance: The paper proposes to tackle approaches to better usability with regards to navigation and visualization of linked data. In general it does carry that well. However, some of the proposed requirements are based on small test cases (i.e., primarily based on the subset case of Data.gov mentioned in Section 2), and the main reference to high level visualization requirements is based on a single paper from 1996. Having said that, the requirements are well-defined and appear to be sound.
On the quality of the coverage:
Section 1, paragraph 4 states:
".. when a URI is dereferenced the response is represented using RDF together with a given serialisation format (e.g., n3, Turtle , XML). Knowledge of how to use this format and interpret information provided using it is restricted to tech-savvy end users, and even in certain cases, only those who have knowledge of Semantic Web (SW) technologies. It is clear that regular web users, so called lay users, who have no knowledge of RDF, nor ontologies, are inhibited in their ability to understand data returned when looking up a URI."
This omits the possibility of a dereferenced URI to have an RDFa serialization [1]. This means that the the inner intricacies of RDF (syntax of a serialization) is visually hidden from the user as the response is in the format of an ordinary Web page in (X)HTML. Hence, "making sense" of the data does not necessarily require any special expertise.
Section 1, paragraph 6 states:
"Clear and coherent visualisation of linked data is essential if the Web of Data is to be used outside of the SW community."
If we were to assume for a moment that linked data is only for the SW community, would that imply that clarity and coherent visualization is a non issue for its members? I think the "is essential if" places an unnecessary weight here, since clarity and coherency is for everyone whether they are a "tech user" or "lay user". Having said that, the point is clear about its importance for mainstream adoption.
Section 2.3, paragraph 1 states:
"Having made the case both for the value of linked data as a means of encoding and sharing distributed data and the need to develop effective, user- and task-oriented systems for consuming this data.."
I find this to be a generalization based on the coverage in Section 2.2.2 Why Linked Data? A Public data Consumption Scenario, as the value is stated by Ryan McKeel of the US National Renewable Energy Laboratory, Linked Data community, and Wikipedia, and illustrated only by a subset case of Data.gov. The importance of the statements given by the Linked Data community and Wikipedia to build a case is debatable. Moreover, can the demonstrations provided for the US Census Bureau data represent the need for effective consumption as a whole?
I would suggest that either more cases should be added to reach a new corollary, or this part of the summary should be revised to reflect only on the cases reviewed before.
Similarly, section 3, mentions "we derive a list of requirements for the design of visualisation tools", "we will use these requirements to guide the survey of existing applications for linked data", and "we will also use the requirements as a benchmark in the discussion of our findings that follows." These are all based on the case in Section 2.2, which is a subset of Data.gov.
While the derived requirements, the conducted survey, and the benchmark discussion appears is sound, the conclusions reached do not represent all cases.
Section 4.1.4, paragraph 1 states:
"Huynh et al. reported that presenting the information as a collection of items was more suitable for the information seeking tasks they support than would a graph representation."
Given the importance of this point, it might be good to briefly mention why they've reached that conclusion.
Sections 4.2.1, 4.2.2, 4.2.3, 4.2.4 and 4.2.8 could use screenshots.
The acronym "MO" is mentioned several times in the paper but no definition is provided. Presumably it is "Music Ontology", however it would be good to define it at least once.
3. Readability:
I have found the paper to be well categorized, the sections well-defined, and easy to read through to say the least. The paper generally has a descriptive tone as opposed to critical. There could have been more evaluation on the soundness of the technologies and offer alternative suggestions where possible. I would imagine most researchers would not have any issues digesting this paper.
4. Importance to the Semantic Web community:
The paper is useful to the community because it covers a good collection of the tools that's currently in use in the wild. It has managed to categorize the tools and services, and explain them with sufficient detail. Like a lot of projects, some do become obsolete or simply disappear into thin air. Hence, the material covered in the paper helps those that are interested in the approaches that are already taken to see what was previously covered; what worked and what did not; and how one tool compared to another at the time. The community would certainly benefit more from papers like these since reports on user interactions gives everyone a reality check.
By reading this paper, I have found the high level requirements to be generally "good to keep in mind". The "Usability Criterion" for the functionality comparisons of the browsers are also well presented. In my opinion, these two areas may be the best take-away from this paper.
[1] ".. using the standards (RDF*, SPARQL)" http://www.w3.org/DesignIssues/LinkedData.html
Review 2 by Anthony Robinson
This paper proposes design requirements for visual methods that can be used to represent linked data. In this case, the authors consider linked data sources to be web-delivered datasets that offer one or more links to other related data sources. One key goal for this research effort is to identify visual strategies that help unlock the potential usefulness of these linked sources so that end-users can understand and utilize them effectively. To achieve this goal, the authors define design requirements that visual representations of linked data should fulfill. Then, they review contemporary methods for visualizing linked data and compare these methods against their design requirements.
This research focuses on an area of increasing interest - there is a clear need to develop approaches that put recent technological and theoretical advances related to the semantic web in the hands of end-users in a way that is both useful and usable.
The proposed design requirements are structured by user type; lay-user and tech-user. These user categories are straightforward - tech-users are people who understand what the semantic web is all about, and lay-users are everyone else. This is a rather simplistic characterization of user profiles and I think there are in fact a large range of different user types, with varying interests, abilities, and concerns when it comes to using and understanding linked data. Since the potential user population for consuming linked data is so large, it seems abrupt to create two buckets to explain all of the variation.
The design requirements themselves include standard advice from Shneiderman's work in Information Visualization, and add on to those several additional task and presentation oriented requirements for each user group. Unfortunately, the authors do not say much about where these requirements came from, how they connect to emerging needs in the semantic web visualization community, if they are inspired by empirical work to actually study end-user behavior or preferences, etc... As such, these design requirements must simply be taken at face value. A deeper evaluation of task requirements would be useful here, for example, comparing recent work by Amar & Stasko who have come up with a set of evidence-based knowledge precepts for the design of visualization tools that are more specific (and therefore immediately usable by tool designers) and offer possible metrics by which one could measure success or failure.
Following the presentation of their design requirements, the authors review in deep detail a wide range of existing tools for browsing linked data sources, using a BBC Music project as an example. The authors split their review of tools into two categories - tools that primarily use text to present results, and tools that provide other visual methods.
To provide some structure to their review results, the authors develop short "usability criterion" and then identify whether or not each tool meets those criterion. Unfortunately, the authors do not provide details on how these lists of criterion are precisely defined, and the scale of "meets criteria / doesn't meet criteria" leaves no room for much in the way of subtlety - a star appears next to some of the notations in each table but it is never explained as to what this indicates. Furthermore, the authors assign a rating as to whether or not each tool would be appropriate for lay-users or tech-users, and how this rating is assigned (i.e. is there a certain score that leads to one or the other rating?) is not discussed.
There are a couple of major issues with this work. First, to be maximally useful to others, a review-oriented methodology for identifying design criteria and evaluating existing approaches against those criteria has to be repeatable by others. Parts of the review criteria and parts of the evaluation metrics are visible in this paper, but other critical details are not. Overall, the empirical value and significance of this research is rather limited. The conclusions are derived from an incomplete study methodology and how study decisions were made and verified is left quite unclear. Design frameworks are maximally useful when they are based at least in part on empirical evidence and very clearly structured so that others can understand precisely how conclusions were derived. One way in which the lack of empirical depth shows is that the results section of this paper (section 5) attempts to synthesize what was learned from the literature review and ends up simply restating the entire contents of each table in verbal form.
Having a more precise, repeatable study design would have forced the authors to first design answerable research questions and then end up with clear, connected conclusions that spoke to the stated objectives of identifying visual methods that can support end-user goals for consuming information from the semantic web. It is not enough to say that a particular tool can or cannot support a particular kind of user based on the presence or absence of a feature - those of us who study users would argue that it is very rare that you can anticipate such things in advance of actually studying them directly through experiments or other means.
Second, the paper states as a main objective that it will review visual methods for representing linked data, and in fact the paper does not provide much detail on visual methods themselves - tools either use them or they do not (and the authors appear to leave out the possibility that text can be used as a visualization method - as demonstrated by many others, IBM's Phrase Nets as one example). So it would have been nice to see a deeper focus here on which representation types were used, which other representations are possible (used in other domains, but maybe not yet used in semantic web applications) so that the reader could understand the full gamut of what is possible, what has been done, and what paths are available to advance the science. Where this paper succeeds is that it provides a very thorough review of recent advances in the text-based and graph-based visualization of linked data for semantic web applications. So I think it holds substantial value for collecting and summarizing that work.
To be clear, I certainly agree with the authors that visual techniques hold a great deal of promise toward the goal of support end-users as they consume information from the semantic web. I encourage the authors to delve deeper than where they've gone to date to study end-users directly (or at least borrow from those who have), explore how visual-computational approaches might work for semantic web applications (an area that isn't mentioned at all in this paper), and focus on answering clearly defined research questions with clearly defined methods to identify the full range of possibilities and specify which ones are maximally useful for which tasks.
Minor issues:
Wikipedia articles are not appropriate sources for a paper in an academic journal, in my opinion. Where I teach they are not acceptable unless the subject of the writing itself is Wikipedia.
Overall this is a well-written article, however it is quite long and there are many redundant sections. Very detailed descriptions for each tool are nice for those who have not read much on each of them, but a good review should synthesize prior work into higher-level concepts and organization so that key trends are the focus rather than atomic-level details on each example.


Comments
authors' response to 2nd set of reviews
** REVIEWER COMMENT:
Section 1, paragraph 5:
"One of the central issues with large-scale LD production is the accuracy and completeness of links with other datasets."
The LD community commonly agrees on the fact that completeness is a 'nice to have', however, it is not always possible, hence, it is considered to be a non-vital issue.
*** RESPONSE
*** Reworded to say:
"One of the central issues with large-scale LD production is the accuracy and completeness of links with other datasets. Identifying such links using the solitary RDF format of a dataset limits the reader's ability to identify any errors and incorrect links. The LD community recognise that a complete solution to this challenge may not be possible; however, visualisation of Linked Data may help to resolve this, as it enables the identification of such errors more easily, using, for instance, a graph visualisation."
** REVIEWER COMMENT:
Section 2 Summary, paragraph 1:
"We work from these two cases to ground our discussion with respect to LD visualisation"
It is not clear why the two examples in this section are particularly used to build the discussion. It might help to briefly elaborate on that before moving on.
*** RESPONSE
***
Have included a brief summary of the types of scenarios used, i.e., chiefly to illustrate the potential in presenting LD to support consumption by BOTH mainstream and technical users:
the data.gov and data.gov.uk use cases with other LD from public bodies - to illustrate the public interest/mainstream user perspective
BBC Programmes/Music - to illustrate the value of Linked Data for media organisations
** REVIEWER COMMENT:
Section 2 Summary, paragraph 4:
"Linked data, encoded in RDF and most commonly returned as XML" and "The (default and commonly found) representation – RDF using XML serialisation –"
"Default"ness in particular needs citation to some stats, or be specific about whether it is referring to RDF serializations that are available from SPARQL endpoints, or published data. Otherwise, it could be argued that majority of the SPARQL endpoints nowadays offer several RDF serializations for their dataset, where RDF/XML is one serialization format among Turtle, N-Triples, and RDF/JSON.
*** RESPONSE
***
Edited to include the other forms of RDF serialisation as examples - as the intention here is to focus on the machine-, rather than human-friendly presentation. Also included new references describing the XML serialisation and its benefits and limitations and confirming that this is the more commonly used for publication - a.o., because it is the W3C recommendation. The evidence is empirical, however, not statistical.
** REVIEWER COMMENT:
Section 5.7.1. Data Verification & Validation:
Might be good to mention Sindice Inspector
http://inspector.sindice.com/
*** RESPONSE
included
authors' response to reviews
* Reviewer 1:
Categorisation of users
Response: We agree with the comments. However for the purposes of this paper, a more refined categorisation would divert from the general aims, which is to examine support for the user outside research and development in Computer Science, and the Semantic Web specifically (which we acknowledge also benefits the "tech-user").
We have therefore clarified better the basis on which the categorisation is done, and included additional citations on HCI and user-centred design that discuss user types and their impact on tool design. We have also included a brief discussion in this section of the role of the domain expert who is NOT a tech-user as defined in this paper. (this type of user - the domain expert - IS referred to in relevant parts of the paper, to indicate where domain expertise contributes to ability to make use of tools).
Finally, in the findings section we state clearly the basis on which we categorise a tool as targeted to a tech- or lay-user.
-----------------
Elicitation of design requirements
Response: We acknowledge the concern that we present a requirements list without clearly citing relevant work. This section has been revised to state clearly existing work on design guidelines from which we derive, first, the high level guidelines for usability in general and especially as applies to tools that aim to support visual analytics. We also supplement these with requirements derived based on the tasks that are carried out by both tech- and lay-users while consuming linked data. For the latter we provide predominantly empirical evidence as presented in relevant publications, both in the Information Visualisation (InfoVis) and Semantic Web (SemWeb) fields. The greater reliance on empirical evidence (rather than established scientific theory) here is due to the fact that best practice in the consumption of Linked Data, especially in the use of visualisation options, is not yet firmly established.
In the findings section we rehash the requirements, as an introduction to our assessment of the tools and functionality available, and use an additional table to list citations that provide evidence for the validity of each (key) requirement. The table (1) is split into two parts, requirements for effective analysis, with a leaning toward visual analytics, and the second part, requirements based on user tasks for consuming linked data.
We have also revised the sub-sections in 5 (Findings) in accordance with the review comments, to summarise the functionality available overall, citing only instances where a specific tool stands out, rather than for each tool.
-----------------
Evaluation of the functionality available for visualising linked data (Findings section)
Response: (see also previous point) The revision of the sections on requirements provides a clearer description of the basis on which we evaluate the functionality supported by each tool.
We have not found published work describing established guidelines or benchmarks for usability evaluation of linked data browsers, and so are unable to cite or work from established practice in the field. However, general usability guidelines are relevant and applicable; we therefore use an analytical approach to evaluating the tools reviewed, by inspecting the user interfaces and the functionality exposed to end users, structured by the guidelines and requirements specified (which are now clearly cited).
Using the BBC Music Beta project as a baseline we attempted to follow a path from the same starting point (specified in the paper) to browse to related information. Because the tools do not necessarily return the same information we were unable to follow a fixed path, but attempt to make use of at least basic exploration functionality available in each tool. Any other restrictions we faced are clearly stated for each tool.
We also consider evaluation reported by tool owners in reporting our findings.
Tables 2 and 3 have also been updated to reflect the revision of relevant sections of the paper, predominantly the Requirements and the detail in the Findings sections.
The qualitative evaluation approach we use allows a fair assessment of the functionality available in the tools, and is easily replicated (see Carpendale 2008, Sharp et al., 2007, Shneiderman et al., 2009).
-----------------
Detail on visualisation techniques
Response: A new subsection has been included - 3.1. to discuss briefly a selection of visualisation techniques and the benefits of different approaches. Additional references have also been included in this and §3.2 on visual representation and analysis.
-----------------
-----------------
* Reviewer 2:
Concern about omitting the relevance of RDFa
Response: We acknowledge the importance of RDFa in presenting linked data in a human-readable format. We have therefore edited the section concerned to state that the use of RDFa is possible - however this has been defined as returning an XHTML response as this is what occurs when performing content negotiation. We however do not provide more detail here as it would divert from the main message (in the introduction). The relevance of RDFa is also mentioned in the Findings section where we state the basis on which tools are categorised as targeted to tech- or lay-users.
-----------------
Suggestion of additional tools to review
Response: We initially considered the 'FoaF Explorer' but didn't include it since we classed it as an RDF rather than a (specific) linked data browser. We have however included it in the list of other (notable) RDF and linked data browsers listed at the start of the Findings section.
We however excluded 'FOAF QDOS' and 'Nitelight' because although they are make use of linked data we find that they don't focus on browsing and exploratory analysis of linked data, which our survey examines.
The last is essentially a mashup between 'URI Burner' and the Microsoft's Pivot Viewer. We attempted to try it out as a good example of multiple tools working together, one of which we had already reviewed. However, it returned a security exception as the server would not allow access to an XML resource required to use the Pivot view.
-----------------
section 1, paragraph 6 states:
"Clear and coherent visualisation of linked data is essential if the Web of Data is to be used outside of the SW community."
Implication that "coherent visualisation" is relevant only for lay-users
Response: We have edited this section to remove the notion of visualisation being 'essential' for lay-users only, so that it makes clear that providing visualisation would 'enable accessibility' to the Web of Data (WoD) and uptake of Linked Data outside of (in addition to) the SemWeb community.
-----------------
Elicitation of design requirements and degree of coverage
(please see also response to reviewer 1)
Response: We have excluded the quote about the use of linked data in the public domain in §2, in line with reviewer 1's comments about the use of Wikipedia as a citation source.
We have included a new use case, in data.gov.uk, (the Research Funding Explorer), to provide a broader view. We have also expanded the BBC use case to mention the larger 'BBC Programmes initiative'.
Further, we renamed §2.2.2 to:
"Why Linked Data? A Public Data Consumption Perspective"
to explain more clearly the aim of this section.
We have also stated that the use cases provide a starting point from which to derive the requirements listed in the paper. The requirements section goes on to cite additional work from which the more complete set of key requirements is derived. Where "key" is the operational word; we acknowledge that we cannot provide an exhaustive list of requirements here, so try to keep them relatively high level in addition to practical.
We have included more information in §3.2 (Design Guidelines) to support our argument/reasoning behind using the challenges defined in §2 to motivate our requirements. This allows us again to identify and discuss key requirements, rather than exhaustive coverage of potential requirements or uses with respect to linked data and its visualisation.
-----------------
§4.1.4, paragraph 1 states:
"Huynh et al. reported that presenting the information as a collection of items was more suitable for the information seeking tasks they support than would a graph representation."
Reviewer comment: "Given the importance of this point, it might be good to briefly mention why they've reached that conclusion."
Response: (see §4.1.4) - we have expanded the sentence to include the authors' reason behind their design decision: 'to provide a comprehensive view of data' - they felt that a list provided this while a graph would not.
-----------------
Concern about coverage of the conclusions
(please see also response to reviewer 1)
Response: We have revised the entire 'Findings' section to address comments by both reviewers. This includes the new table and clarification of the information presented in the original tables, in addition to a revision of the layout of the original tables.
-----------------
"Missing" snapshots
Response: We have located a set of RDF files for the BBC Music Beta pages and have (re)generated some of the images. We have however been unable to obtain useful snapshots for all the browsers.
-----------------
Use of acronym "MO"
The acronym "MO" is mentioned several times in the paper but no definition is provided. Presumably it is "Music Ontology", however it would be good to define it at least once.
Response: The acronym WAS defined at the first point of use - it DOES refer to the 'Music Ontology'.