UnifiedViews: An ETL Tool for RDF Data Management

Tracking #: 1490-2702

Tomas Knap
Peter Hanecak
Jakub Klimek
Christian Mader
Martin Necasky
Bert Van Nuffelen
Petr Skoda

Responsible editor: 
Krzysztof Janowicz

Submission type: 
Tool/System Report
We present UnifiedViews, an Extract-Transform-Load (ETL) framework that allows users to define, execute, monitor, debug, schedule, and share data processing tasks, which may employ custom plugins (data processing units) created by users. UnifiedViews natively supports processing of RDF data. In this paper, we: (1) introduce UnifiedViews' basic concepts and features, (2) demonstrate the maturity of the tool by presenting exemplary projects where UnifiedViews is successfully deployed, and (3) outline research projects and future directions in which UnifiedViews is exploited. Based on our practical experience with the tool, we found that UnifiedViews simplifies the creation and maintenance of Linked Data publication processes.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Tomi Kauppinen submitted on 24/Jan/2017
Minor Revision
Review Comment:

I checked the new version of the article together with comments by reviewers and authors on issues in the previous version. For me this looks now like almost a mature paper on tools for publication.

I propose a minor revision where authors *really* carefully check all the spelling issues. For instance:

- Council of the Europian Union -> “Council of the European Union”
- “oracle database” vs “Oracle database “
- "processing milions of triples” -> “processing millions of triples “

Btw, none of the vote visualisations (like http://www.semantic-web.at/council/map/) worked at the time I checked, since http://data.consilium.europa.eu/sparql seemed to be down.

Review #2
By Daniel Garijo submitted on 28/Jan/2017
Review Comment:

The authors have addressed my comments from the last review. Now section 4 has been summarized, the related work extended and the limitations and future work clearly discussed.
I don't think another review is necessary, but I would like to suggest the authors to rename Section 4 as "Discussion and future work", because it does that rather than summarize lessons learned in general. Finally, I feel a little uncomfortable with some of the claims done by the authors in the conclusions. They claim the interface to be intuitive and that creating DPUs is simple. However, there are no numbers behind these claims in the paper, besides being "confirmed by data wranglers".

I have also detected a couple of typos that the authors may be willing to fix in the final version:
- Page 5 , Sec 2.2 "the first backend realizing that marks his identifier..."->the first backend realizing that marks its identifier"
- Page 8 "The positive experience...let to the second phase" -> led to
- Page 11, sec 3.5.3: "Initial barrier we had"-> An initial barrier

Review #3
Anonymous submitted on 21/Feb/2017
Review Comment:

The paper presents a tool for extracting, transforming, and loading data (RDF data), dubbed UnifiedViews. Using the tool, it is possible to define, execute, monitor, schedule, and share pipelines for converting raw data to linked data. The main components (backend and frontend) and some use cases of applying UnifiedViews tool are presented. The paper was submitted as 'Tools and Systems Report'. It was reviewed considering two dimensions: (1) Quality, importance, and impact of the described tool or system; and (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.
Considering the original submission (http://semantic-web-journal.net/content/unifiedviews-etl-tool-rdf-data-m...), it is noticed that the authors did substantial effort in order to address the reviewers' comments. In this case, I consider that my previous observations (Review #2) have been answered. Therefore, I vote for accepting the article.

Review #4
Anonymous submitted on 24/Feb/2017
Review Comment:

I want to thank the authors for their detailed answers to the reviews and their efforts to address the raised issues. Although still quite short compared to Section 3, Section 2 was improved. Table 1 and Section 4 improve the readability of the whole paper.

As indicated by the answers of the authors most of the issues were addressed I want to answer to two issues:

1. the change from "maintain" to "manage" was not consistently done (for example section 2.2 still contains "maintenance")
2. regarding the explanation of the Core DPUs and the link to https://github.com/UnifiedViews/Plugins/tree/develop: please put the link in the paper as well.

Apart from these, all my previous comments where addressed.

Minor issues:
- page 1: footnote \thanks is duplicated
- page 2: interact with THE tool
- page 3: the notation "n in N^0" is surprisingly formal here, "zero or more" seems more suitable. Also the variables n, n', m are not referenced later.
- page 3: We decided to distinguish THIS TYPE of DPUs
- page 4: obtaining DATA FROM external sources
- page 4: transforming DATA between various formats
- page 5: marks ITS identifier next to
- page 5: When such AN exception is thrown THE pipeline execution is stopped.
- page 7: We FOUND out that
- page 8: let -> led
- page 10: stepwize -> stepwise
- check the whole paper for missing articles
- enumeration numbers inconsistent: page 1/4 uses (1) (2), page 2 uses 1) 2)
- there is a comma missing between footnotes 27 and 28