Linked Open Data Visualization Revisited: A Survey

Tracking #: 937-2148

Oscar Peña
Unai Aguilera
Diego López-de-Ipiña

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Survey Article
Mass adoption of the Semantic Web's vision will not become a reality unless the benefits provided by data published under the Linked Open Data principles are understood by the majority of users. As technical and implementation details are far from being interesting for lay users, the ability of machines and algorithms to understand what the data is about should provide smarter summarisations of the available data. Visualization of Linked Open Data proposes itself as a perfect strategy to ease the access to information by all users, in order to save time learning what the dataset is about and without requiring knowledge on semantics. This article collects previous studies from the Information Visualization and the Exploratory Data Analysis fields in order to apply the lessons learned to Linked Open Data visualization. Datatype analysis and visualization tasks proposed by Ben Shneiderman are also added in the research to cover different visualization features. Finally, an evaluation of the current approaches is performed based on the dimensions previously exposed. The article ends with some conclusions extracted from the research.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Heiko Paulheim submitted on 29/Jan/2015
Major Revision
Review Comment:

The paper gives a survey on Linked Open Data visualization tools. It discusses datatypes and which UI widget types are appropriate to visualize them, as well as typical information visualization tasks and target users. Furthermore, it compares nine different tools according to those criteria.

On the positive side, the paper is very well structured and understandable. The reader can follow the flow of analysis quite well, starting from the discussion of datatypes and visualization tasks to the comparison of actual tools.

What I miss most of all is a clear connection to the characteristics of Linked Open Data, in particular throughout section 2. That section is written on a fairly generic level, making only very few statements about the importance of different datatypes and information seeking tasks in the context of LOD, but mostly summarizing the state of the art of information visualization in general. This is not bad thing to start with, but given the amount of space this section takes in the entire paper, the whole emphasis is too much on generic data visualization and too little on LOD visualization. This holds in particular for table 2, which represents general knowledge on data visualization, backed by the sources given in the paper.

Table 1 is a bit of a start in the right direction, but it only lists vocabularies. It would be much more interesting to know how commonly those are used, e.g., drawing from previous analyses in that direction [1-4]. This would give hints at which datatypes are actually the most common in LOD, an thus help shaping the requirements for a useful LOD visualization approach. Along those lines, findings such as "it is noteworthy the lack of support of 3D data by all the analysed tools" could be questioned more critically - I don't think that there's a substantial amount of 3D data published as LOD, but backed with solid statistics about deployed vocabularies, it could be argued whether this is a real drawback or simply an adoption to the data which is actually out there.

A second point that puzzles me is the choice of tools in section 3. For example, the authors mention previous surveys, such as Dadzie and Rowe (2010). That survey, in turn, lists eight additional LD browsers with visualization capabilities, none of which is mentioned in this survey. Furthermore, I made a quick Google Scholar search myself, discovering a few more relevant works [5-7]. For a survey, I would expect a more complete picture here.

As a last ingredient to the survey, I would have appreciated some clear conclusions: what is the state of linked open data visualization, which aspects have been improved in the last few years (following the older surveys cited in the paper), which are still to be developed? Which questions and problems of LOD visualizations should be put on a research agenda in the field most prominently? The proposal of "smart visualizations" in the conclusion section seems to go in that direction, but it could be more concrete.

In summary, I recommend the authors to rework the paper in the sense of putting a stronger focus on the actual characteristics of LOD. Backed by empirical statements about the data which is actually out there -- be it drawn from previous research or from statistics compiled by the authors themselves -- this survey could grow to a really strong and interesting contribution.

Minor issues:
* Many of the screenshots are very small, they should be turned into two-column figures
* There's something going wrong with special characters on top of page 9, column 2

[1] Hogan et al. (2012): An empirical survey of Linked Data conformance
[2] Schmachtenberg et al. (2014): Adoption of the Linked Data Best Practices in Different Topical Domains
[5] Stuhr et al. (2011): LODWheel - JavaScript-based visualization of LOD data
[6] Mazumdar et al. (2009): Exploring User and System Requirements of Linked Data Visualization through a Visual Dashboard Approach
[7] Zembovicz et al. (2010): openChart: Charting Quantitative Properties in LOD

Review #2
Anonymous submitted on 14/Feb/2015
Review Comment:

The paper summarizes popular concepts and taxonomies from the field of information visualization (InfoVis) and relates them to the visualization of Linked Data (LD). It surveys available tools for Linked Data visualization and derives common characteristics and limitations. It provides tables that summarize which of the InfoVis concepts are implemented in the tools.

The paper is well-written and easy to read and follow. The chosen approach of structuring and evaluating LD visualizations with the help of classical InfoVis concepts and categories is promising. Although there is already an extensive survey on LD visualizations by Dadzie and Rowe (2011), an updated survey could indeed be relevant, as several new visualization approaches for LD have been introduced in the last couple of years.

However, I had mixed feelings when reviewing this manuscript: On the one hand, it provides a good InfoVis summary for the SW community; on the other hand, most of its content is already well-known and only little new value is added. While the latter lies to some extent in the nature of survey articles, I would usually expect more insight into the reviewed tools and identified challenges from a survey article. The current review of existing approaches is quite descriptive. Some of the approaches are related to the taxonomy of Shneiderman, but only few are discussed in more detail. The comparison of the approaches is rather high level and also the "evaluation" is quite limited in scope and content (it is rather a summary). The list of extracted features is valuable, but I miss more extensive insights and conclusions on the topic.

For a survey article, it is also not sufficiently complete, as several LD visualization tools are not included, such as LodLive, RelFinder, DBpedia Mobile, and other tools that have partly already been reviewed by Dadzie and Rowe (2011). It seems the authors limit their survey to web-based tools, as indicated in the conclusions. This restriction should be mentioned earlier and made more explicit: Are all web-based tools surveyed, including those based on Flash (e.g., RelFinder) and Silverlight (e.g., OOBIAN Insight), or is only a certain selection evaluated. Which were the selection criteria? Which method was used to identify and classify the tools? A survey article would need more context here. This holds also for the tables presented in the paper: It remains unclear how they have been created and who decided whether a tool implements a certain feature and under which conditions. Was this cross-checked in some way?

Furthermore, it could be made more clear how this article distinguishes from related work. In how far does it advance the survey of Dadzie and Rowe (2011)? What does it add to their work (apart from an updated summary of LD visualizations)? In Sec. 2.1, it is unclear which contents were taken from Shneiderman and which were added by the authors themselves. It seems that the first sentence of each datatype category has been copied from the text of Shneiderman. Quotes should be used here to clearly indicate which statements are actually by Shneiderman and which are added by the authors. Otherwise, this is not clear and could be considered plagiarism.

Finally, I see a problem in the argumentation, as the tools that the authors surveyed are mostly research prototypes. It is questionable if these tools really need to implement features like customization of the visualization or information about the exploration history. Research prototypes are usually not on the same level as mature industry tools in terms of stability and number of features, and this can usually not be expected. There is certainly a need to make SW developers more aware about InfoVis concepts and best practices, but features like a navigation history have often few impact on research and are therefore not of highest priority when it comes to implementation.

To sum up, I like the idea and approach taken by the authors, but I consider the current manuscript as too descriptive. It goes only little beyond what is already well-known in the InfoVis and SW communities. I would encourage the authors to carefully revise the paper and make it a more extensive survey that is tightly integrated with the summarized InfoVis categories, while considering the specifics of LD. The current paper is a perfect starting point for that. While the first part could be more condensed, the second needs extension and elaboration to be more compelling and to provide novel insight.


Additional comments on abstract, introduction, and conclusion:
Currently, 2/3 of the abstract are motivation, while the paper contents are only very briefly described, with a focus on the structure of the paper. The introduction starts very broad with a motivation frequently used in InfoVis (cave paintings). It then introduces the basics of Linked Data already well-known to the SW community. For the Semantic Web journal, this might be a bit too broad and basic introduction. The conclusions are also rather broad. I would recommend to focus on visual aspects and found insights and implications here, instead of discussing linked data in general, such as in the fourth paragraph.

I would disagree with the following statements:
- "The least known network representation is usually the adjacency matrix." There are network representations that are less known. Matrices are comparatively popular, even for lay users, if we think of timetables and other schedules, etc.
- "Relate: Usually ignored by the LOD visualization tools". There are several examples that depict these relationships, i.e., the VizBoard tool included in the survey links different views, or RelFinder, which even explicitly depicts property relationships - just to mention two tools.
- "LODVizSuite rendering of a research co-authorship network using a force directed layout." This does not look like a force-directed layout to me. Are you sure it is one?
- "The tool works excellent with JSON (JavaScript Object Notation) formats, as the data sharing with visualization libraries is trivial (web-browser based visualization libraries are developed in JS, whose understanding of JSON is direct)." What do you mean by "trivial" and "direct" here. The argumentation is not clear to me, as there are very different JSON formats (for LD) that usually also require transformation before they can be visualized with JS libraries like 3D.

Minor issues:
- Use of "an" instead of "a" before vocals, i.e. not "a especial", "a equivalent", etc.
- Wrong word use "for" instead of "four", "specially" instead of "especially", etc.
- The references are partly incomplete (missing page numbers, publisher, or even proceedings title) and inconsistent.

Review #3
By John Howse submitted on 20/Mar/2015
Major Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

In this survey article the authors compare a number of tools for visualising Linked Open Data. The comparison considers a number of factors: the intended audience for the tools, the types of data each tool allows the user to visualise, and the tasks each tool allows the user to perform.

This article has the makings of a useful contribution to the journal but I think various sections need to be extended before it should be accepted. Thus, I recommend major revisions.

In particular, the evaluation of the tools lacks depth at the moment.
Section 4 summarises the capabilities of the tools with regard to Shneiderman's InfoVis mantra and the datatypes each tool supports, but without any overview of different approaches taken by the tools or of how effective or fit-for-purpose any of the tools are. This is essentially a "box ticking" exercise -- e.g., we are told that LDVizWiz supports "Details on Demand", but how effectively does it do that, and will users (techies, experts or lay users) find it usable?

The survey tells us what tools exist and what their capabilities are, but not whether they fulfil those capabilities effectively. There are diverse existing approaches to visualising the various kinds of information, some traditional and some more recent (e.g. a bar chart versus an area proportional Euler diagram overlaid on a geographical map). Users naturally find some of these visualisations easier to understand than others. Crucially, this depends on the context in which they are used. The background of the user is considered as part of that context, but not the actual task the user is attempting to perform at the time. I write "actual task" because, for this purpose, I think Shneiderman's original list of tasks is too broad. So, this part of the analysis should be supplemented by considering more up-to-date and detailed task taxonomies such as [1]. The conclusion refers to "the long trajectory of InfoVis research", but too few of the findings of this tradition are applied in the evaluation -- in a nutshell, I think this survey would be a much more useful if the usability and fitness-for-purpose of the tools was considered critically and from several perspectives.

The paper should also be more self-contained. For instance, several visualisations which aren't that well known are referred to without any examples provided, such as sunbursts, dendrograms and IciclePartition layouts.

There are many mistakes in the writing that should be fixed (for example, non-sentences with missing words and other issues).

[1] J-W. Ahn, C. Plaisant, and B. Shneiderman. A task taxonomy for network evolution analysis. IEEE Transactions On Visualization And Computer Graphics, 20(3):365–376, 2014.