Interoperable read-write Linked Data application development with the LDP4j framework

Tracking #: 986-2197

Authors: 
Miguel Esteban-Gutierrez
Nandana Mihindukulasooriya
Raúl García-Castro
Asunción Gómez-Pérez

Responsible editor: 
Rinke Hoekstra

Submission type: 
Tool/System Report
Abstract: 
Enterprises are increasingly using a wide range of heterogeneous information systems for executing and governing their business activities. Even if the adoption of service orientation has improved loose coupling and re-usability, applications are still isolated data silos whose integration requires complex transformations and mediations. However, by leveraging Linked Data principles those data silos can now be seamlessly integrated, and this opens the door to new data-driven approaches for Enterprise Application Integration (EAI). In this paper we present LDP4j, an open source Java-based framework for the development of interoperable read-write Linked Data applications, based on the W3C Linked Data Platform (LDP) specification.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Pierre-Antoine Champin submitted on 24/Mar/2015
Suggestion:
Minor Revision
Review Comment:

This paper presents LDP4j, an implementation of the W3C Linked Data Platform recommendation. In this implementation, developers only have to focus on the business logic, while the framework takes care of the specificities of the LDP protocol.

Overall, the paper is well written and quite clear, although section 5.2 is rather abstract when you don't know the framework. It would benefit from being illustrated by a running examples (ma be one of the applications from section 6?).

Other minor corrections are listed below:

S1: "this approach requires having tools" -> remove "having"

S3 (end): "more more suited" -> remove one "more" ; also, I would rephrase this sentence in a more radical way: "modelling languages (RDF Schema and OWL) were designed for inference rather than for validation"

S5 (just before 5.1): "the application model proposed" -> "the proposed application model"

S5.2: "Despite" -> "Despite the fact that"

S7: "this would relief" -> this would relieve"

S7 *Vocabulary support*: as mentionned in section 3, current vocabulary descriptions (using RDFS and OWL) do not express any *restriction*, only possible inferences. This should be explicitly (re)stated, as the title of this paragraph might be misleading (especially because many people misinterpret RDFS and OWL that way). For this issue to be addressed, something like the future recommendation of the Data Shape WG should be used, or Hydra (http://hydra-cg.com/), but that goes beyond what is usually considered as a "vocabulary".

Review #2
By Antonis Loizou submitted on 14/Apr/2015
Suggestion:
Reject
Review Comment:

The paper describes the LDP4j framework, a Java implementation of W3C's LDP recommendation. The framework is presented as middleware to support the development of LDP applications, as opposed to a data access mechanism.

LDP4j handles all aspects of the LDP protocol thus allowing the application developer to focus on implementing the business logic required.

The framework is evaluated against the official LDP test suite, which verifies its compliance with the LDP specification with the exception of a very small number of features that are yet to be implemented.

The work described in this paper is certainly very interesting, and the paper provides a very clear, well written description of it.

That said, as a 'Tools and System Report', I found it very hard to assess the impact and importance of this work.

The authors argue that the usefulness of the framework is demonstrated by the two use cases presented in the paper, in which LDP4j was used to handle the LDP protocol communication. However, both applications have been developed by the authors of this paper, and no indication is given on their deployment status or their usage by third parties. Moreover, the authors claim that through using LDP4j the development time for these applications was reduced, but no evidence is given to support this claim.

In addition, section 3 discusses three requirements that are seen as central to the adoption of LDP-based Enterprise Application Integration: security requirements, transaction support, and data validation. However, security and data validation are never mentioned in the remainder of the paper, while transaction support is only discussed in the context of future work.

Finally, if my understanding of http://www.ldp4j.org/#/learn/start is correct, LDP4j is based on the second public working draft of the LDP specification (2013-03-07).

With the above considerations in mind, I find the closing statement of the paper, "This framework represents the next big step for realizing the vision of Linked Data-based Enterprise Application Integration", rather premature (at least based on the evidence presented in the paper).

Thus, my recommendation is for the authors to resubmit this work when its impact can be demonstrated, as well as the implementation updated to reflect the latest (recommendation status) version of the LDP specification.

Minor typos/edits
- Sec.2§2 : The two main concepts defined in LDP include -> The two main concepts defined in LDP are
- Sec.5.2§2 is unclear
- Sec.5.2§6 : ... via and endpoint -> via an endpoint
- Sec.5.2$6 : Items (a) and (b) are in italics while (c) is bold

Review #3
By Ruben Verborgh submitted on 28/Oct/2015
Suggestion:
Major Revision
Review Comment:

This article presents the LDP4j framework, which strives to support the development of interoperable read-write Linked Data applications. It does this by providing a stack that is compatible with the Linked Data platform, the REST architectural style, and the HTTP protocol. While the article convinces me that LDP4j is a powerful framework, some basic questions are not answered clearly, and some claims are not backed up sufficiently. I could not follow some of the explanations and the structure is confusing at some times. At this stage, I have to recommend a major rewrite of the article in order to make the information useful and actionable for the readers. A such, I'm labeling this as a major revision, with the explicit mention that the quality of the tool is certainly adequate and does not need improvement, but the contents and style of the article need work. The reminder of this review gives practical advice on how this could be conducted in three main points, and finally details specific remarks.

1. Purpose and functionality of LDP4j
-------------------------------------

The most important point is that the article should make clear what LDP4j exactly does. I realize this is a very basic question, but this was insufficiently clear to me—and if anything should be clear, it should be this. The text currently seems to assume a high degree of familiarity with LDP (which I personally have) and at some points even LDP4j (which I have not). However, the article should be made clear for a much broader audience: in general, everybody who reads just the introduction should know what LDP4j can do for them and what not. (Regardless of whether they know LDP, I would say.)

Concretely, in the abstract and title, LDP4j is labeled as "a framework for the development of interoperable read-write Linked Data applications". Section 1 mentions that LDP4j addresses the problem that other frameworks provide LDP support, but not as an application integration mechanism. However, I found it hard to grasp what exactly this means. Is LDP4j something that puts an LDP layer on top of my business logic (like HTTP frameworks put an HTTP layer)? Will LDP4j help me consume other data on the Web? Does it support the follow-your-nose approach mentioned in the introduction? Section 5 seems to indicate that LDP4j is a framework you need when building a server with an LDP interface. Is this the only use case for LDP4j?

In other words, I want this article to give a very concrete and tangible to the answer to the questions:
- What problems does LDP4j solve?
- How does LDP4j solve those problems?
- Why is LDP4j a good way to solve those problems?
- How exactly does LDP4j compare to other solutions?
- How can I get started with LDP4j to solve these problems?

With regard to these questions, I want to see a much clearer requirements engineering section. The current Section 4.1 tries to take this role, but I found it unclear and unconvincing. I would prefer to see a clear line of argument there, saying "this is what developers need and why existing tools don't provide that". A guiding example from the beginning onwards might help with this; the examples in 6.2 are helpful but late.

2. Backing up claims and setting expectations right
---------------------------------------------------

As written above, some of the claims about what LDP4j does are vague and unclear, but at the same time, this makes it also hard to verify them. How exactly does LDP4j reach "interoperable Linked Data applications", what does that mean, and how do you know it does? As far as I understood from Section 6.2, what LDP4j really does is providing an LDP wrapper for other components. In 6.2.1, LDP4j exposes a relational database as LDP, and in 6.2.2, LDP4j provides an LDP wrapper on top of Bugzilla. So in other words, like there exist HTTP frameworks that expose Java objects over HTTP, this framework exposes (mapped) triples over LDP. Is this all of the functionality of LDP4j, or is there more?

Don't get me wrong, that by itself would make LDP4j very useful. But I think the description of it being a framework for "interoperable Linked Data applications" is too vague and perhaps oversold. After all, LDP is just one of the many steps for interoperability. If LDP4j is a framework to provide an LDP interface for read/write backends, just say so. Then readers know exactly what they can expect from this cool and what they cannot. The promise of Section 1, that LDP4j would provide "a novel approach of application integration" is not fulfilled for me. Likely, LDP4j is already useful without that promise, but I'd rather have no promise than an unfulfilled one.

In particular, some of the claims with regard to "novel paradigms" might be overstated. The tool exposes things as LDP, which is very useful, but not necessarily a novel paradigm. LDP is a useful and standardized type of API with RDF-based representations, but just an HTTP API nonetheless. It's main merit is in the fact that it is standardized and has several useful properties, but it is not a fundamentally different type of interaction compared to other REST APIs out there.

Also, in the conclusion there is a mention of LDP4j extending the LDP protocol "with features for exposing the domain knowledge that dictates how to interact with the application." I was not able to find this in the paper. What exactly are these features, how do I use them, how do they differ from what other tools do?

I would recommend to be very clear, precise, and tangible about what LDP4j does. Also, its limitations and their implications need to be explained more concretely.

3. Enabling readers to use LDP4j
--------------------------------

For me, a tools paper should be an enabler for people to use it. This includes describing what problems it tackles (as discussed in the previous points), but also giving people an indication of where to get started. Section 5 starts out well by explaining the main components, but it gets more messy after that. In particular, Figure 4 cannot be understood within the scope of the paper. I would recommend not to strive for completeness, but instead be instructive on the first steps a user has to take: this is how you get started with LDP4j, these are the first things you want to do, this is how an application with LDP4j takes shape.

Specific remarks on the current text
------------------------------------
- The abstract can be more concrete about use cases for which one would want to use LDP4j.

- Footnote 1 does not adequately back up the statement that precedes it.

- It it not clear where the first part of the introduction is going. It appears as if semantic heterogeneity is going to be the main focus, whereas it seems that LDP4j works on the protocol level. In that sense, I would rather argue that LDP focuses on structural heterogeneity, namely the structure of the API. The listed benefits of Linked Data are indeed true in theory, but they differ in practice, and I'm not sure how they contribute to the motivation for LDP4j.

- The final paragraphs of the introduction aim to differentiate existing LDP support and LDP4j, but are neither clear nor convincing enough.

- Section 2 explains LDP very clearly. I would perhaps not say that "LDP extends HTTP", but rather that "LDP builds on top of HTTP" (just like HTTP does not extend TCP/IP). If the three kinds of LDP containers are mentioned, their differences should at least be explained.

- Section 3 was rather unclear to me. What is the purpose of this section? Are you criticizing LDP, or defining an additional needs? In the latter case, shouldn't this be merged with 4.1 then? Also, the Linked Data traversal part is independent of LDP, so I'm unsure why this is mentioned. Reference [5] should be briefly explained, what is the novel approach? It seems that the sentence containing "as a middleware provider LDP support is not enough" aims to state an important difference. However, I could not understand it; this needs elaboration.

- To what extent is LDP4j fulfilling the requirements you list such as security and consistency? It is not clear whether, and how, LDP4j does this. As an example, for data validation, solutions like RDFUnit exist. Is LDP4j competing with them, extending them, interfacing with them?

- I found Section 4.1 very confusing. I think the intention was to have some kind of requirements engineering here, but the argument is rather that not adopting LDP4j leads to trouble. I'd rather see a constructive argument here: this is what we need. Some of the implications are also overstated: I'm doubting whether an in-depth understanding of all listed RFCs is required to do anything. My suggestion would be to turn this section into a structured list of requirements.

- Is 4.2 intended as a related work section? If not, I didn't understand the purpose. If it is, it could be good to put this earlier in the paper. Then you can argue: "these are the tools that exist, but they don't go far enough". And "these and these are the additional requirements we have, so we built LDF4j". Alternatively, you can leave the structure as-is, but then 4.2 needs to make a stronger connection with the requirements, and why they are not met by existing tools.

- 5(.0) is mostly clear, I just wonder why HTTP-compliance is explicitly listed, given that many frameworks have this already (and I presume LDP4j doesn't implement this from scratch).

- 5.2 needs more clarity: what does "defining the semantics of the types of resources managed by an application" mean? This is an example of something I imagine an LDP4j user would confirm is correct, but is hard to understand for non LDP4J users. Keep in mind that you are writing for people who do not know LDP4j at all. As a totally different example, any Linux user would understand "the `screen` tool helps me manage background jobs", whereas this phrase would be meaningless to anybody else. They instead need an argument such as "if you need a program to continue running while you work on something else, `screen` can do this for you". Try to apply this principle when explaining LDP4j as well. On a similar note, do define what exactly you mean with "endpoints", perhaps giving some examples.

- The relations in Figure 3 are not self-explanatory and require some details. For instance, what does it mean for a handler to "exchange" a dataset, and for a template to be "handled by" a handler?

- Where is the data manipulation API in Figure 2?

- What is meant by "the LDP4j application"? Is this the framework, or any application build with it?

- Why is an application session necessary (especially if it will be terminated immediately afterwards, as Fig. 5 and its discussion seem to suggest)?

- I would suggest to name Section 6 "Verification / validation", because it does not really evaluate. Rather, it verifies whether LDP4j confirms to the specification, and validates whether it is useful for 2 example applications.

- I would like to see additional details for the failing MUST and SHOULD tests. In particular, SHOULD is defined in RFC2119 as "there may exist valid reasons in particular circumstances to ignore a particular item". What are those reasons and circumstances?

- In 6.2.2, I would refrain from using the word "novel", as I'm not sure I agree that the LDP paradigm is so novel compared to other REST APIs (but novelty is not a prerequisite for usefulness).

- I would suggest to place footnote 21 in the text.

- In general, footnotes to URLs of specifications and the like are best turned into references, which provides additional details about authors etc. For instance, footnote 3 is the same as reference [4], but the latter gives more information.

- The conclusions disappoint; they are a mere summary. Also, as indicated above, I'm not sure how LDP4j "extends" the LDP protocol. I would like to see lessons learned, and concrete pointers on how to apply LDP4j in projects. Focus on enabling the reader here.