Survey of Tools for Linked Data Consumption

Tracking #: 1908-3121

Jakub Klimek
Petr Skoda
Martin Necasky

Responsible editor: 
Ruben Verborgh

Submission type: 
Survey Article
There is lots of data published as Linked (Open) Data (LOD/LD). At the same time, there is also a multitude of tools for publication of LD. However, potential LD consumers still have difficulty discovering, accessing and exploiting LD. This is because compared to consumption of traditional data formats such as XML and CSV files, there is a distinct lack of tools for consumption of LD. The promoters of LD use the well-known 5-star Open Data deployment scheme to suggest that consumption of LD is a better experience once the consumer knows RDF and related technologies. This suggestion, however, falls short when the consumers search for an appropriate tooling support for LD consumption. In this paper we define a LD consumption process. Based on this process and current literature, we define a set of 34 requirements a hypothetical Linked Data Consumption Platform (LDCP) should ideally fulfill. We cover those requirements with a set of 94 evaluation criteria. We survey 110 tools identified as potential candidates for an LDCP, eliminating them in 3 rounds until 16 candidates for remain. We evaluate the 16 candidates using our 94 criteria. Based on this evaluation we show which parts of the LD consumption process are covered by the 16 candidates. Finally, we identify 8 tools which satisfy our requirements on being a LDCP. We also show that there are important LD consumption steps which are not sufficiently covered by existing tools. The authors of LDCP implementations may use this survey to decide about directions of future development of their tools. LD experts may use it to see the level of support of the state of the art technologies in existing tools. Non-LD experts may use it to choose a tool which supports their LD processing needs without requiring them to have expert knowledge of the technologies. The paper can also be used as an introductory text to LD consumption.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Elena Demidova submitted on 03/Jun/2018
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

In the current version, the authors have addressed my major concerns by providing clarifications regarding the target user group.
They also revised major parts of the requirement definitions pointed out in my review.
In my view, the paper has been improved and can be accepted.

Review #2
By Ruben Taelman submitted on 11/Jun/2018
Review Comment:

I only had a couple of minor comments during the last review round,
which all have been resolved by the authors.
I have no further comments on this article,
so I recommend an accept.

Review #3
By Daniel Garijo submitted on 01/Jul/2018
Review Comment:

The authors have successfully answered all my comments. I re-read the modified parts of the paper and have some final comments, but I don't think that another review from my end is necessary. This paper is backed up by an impressive amount of work, and I think it helps putting all Linked Data efforts in perspective. I list my final comments below:

Some of the phrasing could still be improved. For example, "There is lots of data published as Linked (Open) Data (LOD/LD)" Is not incorrect, but it's not formal either. Instead, something like "There is a large number of datasets published as Linked (Open) Data" reads better.

"We have identified the most relevant venues where publications about relevant tools could appear" -> In general I would avoid to use the same adjectives "relevant" twice. Instead, you could say : We have identified the most common venues/important venues...

"We also included the workshops listed below, other workshops were not investigated". -> If you say that you are not including them, you should state why. I would just state that "the following workshops have been considered".

Table 3 has a caption that doesn't adjust to the width of the page

"This information can be stored as an accompanying manifest using the PROV Ontology and may look like this (taken directly from the W3C Recommendation)" ->There are multiple PROV recommendations, please cite the one the example is referring to.

Some of the tables appear very far away from the position they are referenced, which is confusing. E.g., table 1.