Decentralized Messaging for RDF Stream Processing on the Web

Tracking #: 1749-2961

Jean Paul Calbimonte

Responsible editor: 
Guest Editors Stream Reasoning 2017

Submission type: 
Tool/System Report
The presence of data streams on the Web is constantly increasing in terms of volume and relevance, for a large number of applications domains and use-cases. As a consequence, there is a growing need for coherent data and processing models for streams, including data and metadata semantics that can help integrating, interpreting, and reusing them. RDF Stream Processing (RSP) introduced theoretical foundations and concrete technologies to deal with these issues, ranging from continuous query processors to stream reasoners. However, most of these efforts lack support for communication and interaction at the Web level. This paper proposes a decentralized model and implementation, RSP actors, for enabling Web interactions among RSP engines, based on the actor paradigm. The RSP actors proposed in this work, use a message-passing mechanism for asynchronous communication of RDF streams and metadata, and is designed to encapsulate the functionalities of existing RSP query engines, Complex Event Processors, or stream reasoners. Furthermore, we have used and extended the Linked Data Notifications recommendation of W3C, as a building block for a specific HTTP-based implementation of the model. The RSP actors code-base is open-sourced, and provides three concrete implementations of well known RSP/stream reasoners developed by the RSP community, to show the feasibility of the approach.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 16/Jan/2018
Major Revision
Review Comment:

This tool/resource paper proposes a way of addressing the issue of communication of RDF streams in decentralized environments on the web. The paper outlines some use-cases and discusses some of the challenges within stream reasoning and RDF Stream Processing (RSP), before describing some of the progress that has been made so far. The problem of communicating streaming RDF data using web standards is described as one of the key areas where RSP is still lacking. I agree that this is indeed an important problem, hence, developing more tools and methods to do that, or even agreeing on some standards, is essential, whereby also any well-thought out approach in this area does have the potential to make a real impact on RDF stream processing and web stream processing in general. However, as will be discussed further later in this review, this paper is at the moment not really clear enough to establish if the proposed approach fulfills this potential or not.

The proposal uses a combination of the Linked Data Notification (LDN) protocol and the actor model as a way of making RDF streams available online. LDN is here used as a way of communicating metadata about the streams, and the actor model is proposed as a way of supporting sending and receiving of RDF data streams asynchronously. The approach lets individual RSP actors detach from a single centralized system and allows them to communicate over HTTP. The proposal is demonstrated based on the Akka framework, which has been adopted for use with a subset of the currently available stream reasoning implementations. An open sourced API allows this to be extended further.

While the paper reads fairly well overall, I have some reservations regarding the clarity of descriptions in the paper, and the experimental validation, which makes it hard to determine potential impact, as well as how much is actually added by this API (novelty and contribution) compared to just using the already available technologies as such.

The target of the paper (at least in my understanding) is to address the issue of publishing RDF streams on the web, using the LDN protocol. LDN seems like a good candidate to be used as a starting point for publishing streams.

However, while the actor model is a good fit for communicating in a loosely-coupled asynchronous system, the need for this in the context of supporting messaging on the web is less obvious, since consumers and producers are already, in a sense, decoupled. For example, what makes the actor model a better alternative than e.g. publish-subscribe patterns and other push-based approaches? In the "Related Work" section, this is discussed in passing, but it would be good to provide a bit more motivation for this particular approach. Velocity and volume of streams are stated as a possible concern, but with messaging services like Apache Kafka offering high throughput and stability, there are some potential for improvement in this section. The introduction of the proposed API also makes one wonder about its relationship with other related web APIs such as TripleWave, WeSP, and [31], and this is not elaborated in the paper.

The strongest point of the proposed approach to me seems to be the application of the LDN protocol for communicating metadata about streams, but perhaps also for consuming and publishing streaming data. LDN, or possibly and extension of it, could have considerable impact with respect to how RDF streams are published and consumed in the future. Unfortunately this paper does not make a clear enough case regarding how exactly it should be used, as well as its limitations and drawbacks in practice, in order to judge if the proposed API is the tool that will indeed make this impact.

One main weakness of the paper is also the experiments section, which does not clearly establish neither the feasibility of applying the API, nor its limitations and drawbacks, in order to show the usefulness of the proposed tool. The experiments focus on throughput performance with the API implementation and an existing RSP. This seems more like a benchmark of Akka/CQELS performance than an evaluation of decentralized communication of RDF streams on the web. CQELS seems to be the obvious bottleneck in both experiment 1 and 2, making it difficult to assess the relevance of these results. The two experiments could be viewed as a case of populating a stream with multiple sensors, which are consumed by multiple receivers, but why is the RSP processing required here but not in the last experiment? The final experiments are a bit difficult to interpret and require some clarification with respect to the purpose and results. In general, the experiments should probably be more clearly oriented towards web-based communication of streams. For example, how long does it take to set up a connection with a stream using LDN the proposed API and what type of throughput and latencies would be expected in a system running on the web?

The description of the experiments is also a bit too vague to allow proper replication of the tests. For example, it is not clear whether there is an LDN service endpoint in place through which the communication between the senders and consumers are orchestrated, and whether the messages passed between sender and receiver happens via an intermediate web endpoint or if they communicate directly? Is Akka using HTTP or the TCP-based communication (i.e. running local or remote)? Are actors and CQELS instances running in the same JVM? What are the receivers/CQELS instances doing in the different experiments with respect to the streaming data?

Overall, this is an interesting proposal, but without further clarification and details, it is hard to judge both the feasibility of applying the proposed tool/method at web scale, as well as the impact that this may have in the future.

Additional remarks:
- Algorithm 1: "ackgetspostQuery(msg.body)" = "ack <- postQuery(msg.body)"
- The implementation uses the "consumeGraph" and decomposes the graph into triples but can the graph annotations from the default graph also be streamed and consumed?
- Registering of queries seems to only cover SELECT queries: is this a limitation of the API or implementation?
- In the case of CONSTRUCT, how would streamed "graphs" (as opposed to triples in CQELS/C-SPARQL) be generated to fit the named graph stream model?
- In Figure 3, 6 and 7: Is "mailbox" = "inbox"?
- Figure 6 and 7 need more in-depth explanations.
- In Listing 1 the window should be "[RANGE 2h]" for CQELS
- "Experimentation": In the line diagrams, the data points should be indicated.
- "Experimentation": The paper refers to the diagram stream rates incorrectly at some places.
- "Discussion and Potential Impact" could discuss more clearly the (potential) impact of the proposed approach.
- Check capitalization and formatting (e.g. title of proceedings) in the list of references.
- Some abbreviations are not explained or presented out of order (IoT, ODBA, CEP, ...)
- In the introduction, a reference is made to the “original challenges” but these are not explained.
- “Incremental materialization, truth maintenance systems, OBDA and other techniques [18, 28, 31]” unclear which references refer to what. Also, [31] seems out of place.
- “loose-coupled” = loosely-coupled
- In the final part of section 3, the requirements for the abstract model with respect to 2 and 7 are not explicitly mentioned (although they are discussed).
- In “Format & vocabularies” references to the mentioned vocabularies are missing.
- In “Pushing stream elements” you advocate the use of multiple inboxes for different protocols, but would not HTTP headers and content negotiation be able to serve the same purpose?

Review #2
Anonymous submitted on 23/Mar/2018
Major Revision
Review Comment:

(0) Summary
All well-established W3C standardizations regarding the Semantic Web so far are concerned, more or less, with static RDF repositories. Recent efforts in the RDF Stream Processing (RSP) community, where the author of the paper under review has been one of the main contributors, lead to various proposals regarding query languages over RDF streams, their semantics, requirements on RDF stream engines, stream benchmarks etc. The main motivation for the present paper is that, according to the author, there has been less research on the global architecture of web that takes streams seriously. So, in filling the gap, the paper proposes a general architecture based on the actor paradigm and the Linked Data Protocol. The paper describes basics of RSP, the actor paradigm and its adaptation using LDN as a front-end. Furthermore, experiments on some implementation of the model are discussed that are meant to illustrate the scalability (feasibility) of the architecture.

I think the author has a good point in raising again the question on how to make the web a better web from the stream perspective. The main interesting aspect is that of considering streams as first-citizen objects that should be made available in the web. For this, the author proposes to exploit the actor paradigm relying on the general benefits that this actor model has. In this adapted actor model, streams are mainly administered in what the author calls the Stream Receiver. The Stream Senders provide stream element inputs and the Consumer can ask the Receiver to output stream elements from a stream specified by its IRI.

Though I am absolutely d’accord with the main motivation for this paper, I have problems in accepting the paper in this form, mainly due to its hybrid status as a semi-research paper and semi-report on tools and systems, where I have a strong bias and suggestion to go rather for research paper.

(1) Quality, importance, and impact of the described tool or system
As the paper is submitted in the “Report on Tools and Systems Report”, one faces the following difficulty in reviewing this paper: There are actually two software-related artifacts in a wide sense, namely a general, interface software framework for a Web architecture enabling the integration of RDF stream engines, and a concrete prototype implementation based on some set of RDF stream engines. The interface software framework (consisting of interfaces described in a concrete programming language) is not software in a more genuine sense of the word, rather it is an abstract model. It’s quality, importance and impact can also be judged. And, indeed, the paper at hand argues for the quality of the abstract model: it describes experiments with a prototype implementation (the second artifact), which are meant to show the feasibility (or perhaps also the scalability) of the model. In his prototype implementation, the author considers different instances of the CQELS engine and considers query answering (with the queries and the data taken from the SRBenchmark) as the main service to be used. The experiments seem to suggest that the model is indeed feasible/scalable, and so one has a clue on the good quality of the model w.r.t. feasibility/scalability.
As I have understood, the author really wants to stress the evaluation of this abstract model which may not be completely new but nonetheless was not described in this form elsewhere in the literature. The fact that he needs more than half of the paper in order to describe the model hints also to the fact that the model (and not the prototype implementation) is the focus of the paper.

But then one would have to to consider also different aspects of the quality, importance and impact etc. of the model. The experiments alone do not show, e.g., how the web would benefit from the actor-based model but can only meant to show that the proposed actor model would do no harm to the web - regarding at least the feasibility/scalability aspect measured by relative throughput. Of course, as the model is relatively new, one cannot say much on its importance and impact (the use of it by other people). Moreover, as there is no other architecture yet, it is difficult to compare it to other models.

But there are other criteria one should discuss for this model with sufficient detail and persuading illustrations. For example, taking query answering as the main service, the question is what we gain with the actor model in answering queries? Can we answer queries that we could not answer before due to the fact that we now can refer to streams that are "produced" elsewhere? Do we gain better precision or recall? Can queries be answered faster? As far as I can see, the quantitative experiments described in this paper show “only” that the paradigm does not do harm to query answering. Moreover, in evaluating the model, I would have expected some general discussion about the lurking danger of redundant streams, about federation and provenance aspects and about the discovery process which seems to be of outmost importance in order get an overview over available streams that one can use for the specific query task. The metadata regarding the streams seems to be important in order to solve a specific query task. How does the proposed model facilitate the use of the metadata in order to solve a query task? etc.

The model rests on the the well-established actors model, but nonetheless the applicability and benefit of it for the web must be discussed in more detail. This concerns also some technicalities related to streams. For example, the author stresses the point that streams differ from static data in the point that they fade out. In the interface there is also some method for retrieving a specific stream element. I would have expected this to be the last, most recent element in the stream. If this is the case I see no problem. Otherwise it would mean that the receiver must have some appropriate “window size” to fit the needs of different potential consumers. But what would be a good average window size? How is it specified? Heuristics?

Another point is that there is a deviation from (or refinement of) the actor model in that that there is a distinction between two types of inboxes, one as an input the other as an output. The author notes that a stream can have both roles, that of input stream and output stream. Looking at the “definitions” this can be the case only when the stream has the role in different actors. Nonetheless, couldn't it be the case that one has some circular arrangements of actors leading to un-desired recursions in a stream?

Summing up, considering the quality, importance and impact of the first software artifact, which is actually an interface framework and hence an abstract model, it seems rather that the paper at hand is a “system-for-paper” type of paper and not a “paper-for-system” type of paper (see comments section Hence, the paper needs a major revision by addressing the points above to become an acceptable research paper.

Maybe, I have misunderstood the emphasis of the author, and the focus of the paper is really on the second software artifact, namely the prototype implementation. Of course one can discuss the concrete implementation available under And I am pretty sure that the author did a good job in implementing his model. But it is rather awkward to see this system as the system/tool to be judged for quality, importance and impact. As an analogue consider, say, a description logic (DL) reasoner: Here one may develop the general principles of the reasoner (such as the type of inference rules, tableau rules) in a research paper and then, in a report on systems and tools, one may “sell” a concrete DL reasoner software showing how the inference rules were implemented (efficiently), describing possible optimization strategies, evaluating it with benchmarks or by comparison with other DL reasoners. In the case of the paper at hand, the author does not want to “sell” his concrete implementation but the underlying RSP model.

(2) Clarity, illustration, and readability
All in all the paper is well-readable. However, I did not get much gain from any of the figures outside the experiments section: The “contents” of the figures do not add much to what has been said in the text. So I would suggest deleting them. Rather the author should spend some effort in introducing his vision with some running example, for which, may be, also figures are given. A second point regarding presentation are the sections on software-related material - be it the interface-stuff in Section 3 or the stuff on some concrete implementations described in Section 5. Surely, this paper is intended as a software-description paper and so one should expect also some software related descriptions and listings, but -in my honest opinion- a description of an algorithm as in Algorithm 1 is out of place. A third point regarding presentation is the terminology used by the author. I was confused about the use of a “stream receiver”. The role of the stream receiver is rather that of a mediator or a channel via which “streams are exchanged”. So maybe the author can give a thought about changing the terminology to Stream sender, stream channel, and stream receiver (= the author’s stream consumer).
A fourth point is the presentation of the experiments. Here the author should provide more details on the experimental configuration: For which queries do the experiments measure the relative throughput? Is it one specific, is the average over all of them?
A fifth point regarding presentation is the conclusion: it is not a conclusion rather a repetition of (some of) the contents from the introduction.

In the related work I would have expected at least a short hint on the whole aspect of web services.

(3) Suggestion
Due to the points above I suggest a major revision where the paper is submitted as a research paper and where the following points are addressed:

1) Detailed critical discussion of the pros and contras of using the actor model w.r.t. query answering - illustrating these with a concrete example.

2) Clarifying details regarding the configuration of the experiments

3) Improving the presentation w.r.t. the points mentioned above

4) Correction of typos (see below)

(4) Typos and minor suggestions for improvement
p. 1, abstract:opens-sourced => open-sourced
p. 1, l. col.: events processing => event processing
p. 1, l. col.: to analyze => in analyzing
p. 1, r. col.: Move “as a result” to the the beginning of the sentence starting with “Processing”

p. 2, l. col., pa. 1: Is LARS really about RDF?
p. 2, l. col., pa. 1: Add spaces before citations “[11, 16]”, “[17]” and “[18]”
p. 2, l. col., pa. 2: Reformulate “and through Web standards”
p. 2, r. col., pa. 1: “as depicted in Figure 1” refers to the vision outlined in 20. But does this vision really concern only the concrete engines given in the
p. 2, r. col., pa. 2: implementation , => implementation,
p. 2, r. col., pa. 3: follows: we => follows: We
p. 2, r. col., pa. 3: in details => in detail
p. 2, r. col., pa. 3: related works =?=> related work

p. 3, l. col., pa. 2: syntax and semantics =?=> syntactical and semantical
p. 3, l. col., pa. 2: As a concrete example of these languages =?=> For illustration purposes
p. 3, l. col., pa. 3: Here and elsewhere prevent breaks in listings.
p. 3, r. col., pa. 1: How is “ontology” defined?
p. 3, r. col., pa. 1: consecutive => consecutive instances
p. 3, r. col., pa. 1: form => from
p. 3, r. col., pa. 1: “ABox” and “TBox” are not introduced/defined
p. 3, r. col., pa. 1: Paragraph break after “reasoners.”
p. 3, r. col., pa. 1: much considerations => much consideration

p. 4, l. col., pa. 1: et al. => and colleagues
p. 4, l. col., pa. 1: (See Figure 3) => (see Figure 3)
p. 4, l. col., pa. 1: Here and elsewhere be consistent in (not) using
abbreviations for “Figure”.
p. 4, r. col., pa. 2: The notifications are by themselves not “Web resources that can be identified” but are references to Web resources
p. 4, r. col., pa. 4: May be instead of “data exchange” use “exchanging data”
as data exchange reminds one of the area dealt with in database theory.
p. 4, r. col., pa. 1: and scalability => , and scalability

p. 5, r. col., pa. 1: and RDF => an RDF
p. 5, r. col., pa. 2: relative => related
p. 5, r. col., pa. 2: To do, so => To do so,
p. 5, r. col., pa. 3: stored data => stored data (requirement 4)
p. 5, r. col., pa. 3: allow => allows

p. 6, l. col., pa. 2: expectations =?=> constraints
p. 6, l. col., pa. 2: Links or references to SSN, PROV
p. 6, l. col., pa. 3: actor model, the it => actor model, it
p. 6, r. col., pa. 1: case => cases
p. 6, r. col., pa. 1: write => write on(to)

p. 7, l. col., pa. 1, in item “RetrieveStreamItem”: is it really the case that a specific
stream element is requested or rather that the most recent element in a stream identified
by the IRI is returned?
p. 7, l. col., pa. 1, in item “RetrieveStreamItem”: and element => an element
p. 7, l. col., pa. 4: exiting => existing
p. 7, l. col., pa. 6: “are essentially meant” is vague: do you mean the stricter “are allowed“
p. 7, r. col., pa. 1, in case RetrieveStream: send(getStream(msg.uri) => send(getStream(msg.uri))
p. 7, r. col., pa. 1, in case RetrieveStreamItem: What does the “r.” stand for?
p. 7, r. col., pa. 3: such stream => such a stream
p. 7, r. col., pa. 3: “RDF streams can also be available as both input and output streams”. Due to the intended use this can only be the case across different actors, not with the stream receiver, right? Does this concern different actors?
p. 7, r. col., pa. 5: metadata , => metadata,
p. 7, r. col., pa. 5: send and RDF => send an RDF

p. 8, l. col., pa. 3: can derived =?=> can deviate
p. 8, l. col., pa. 4: of certain stream => of a certain stream
p. 8, r. col., pa. 2: This IRI is =?=> This IRI refers to
p. 8, r. col., pa. 8: of the results => for the results

p. 9, l. col., pa. 4: and RDF => an RDF
p. 9, l. col., pa. 4: times-tamped => time-stamped

p. 10, l. col., pa. 1, first listing: Missing timestamps?
p. 10, l. col., pa.2: time-annotated => timestamped
p. 10, l. col., pa. 4: in WebSocket => in WebSocket)
p. 10, r. col., pa. 2: CQELS => following CQELS

p. 11, l. col., pa. 2: align “LDnStreamReceiver”
p. 11, r. col., pa. 6: align “registerSelect”

p. 12, r. col., pa. 4: How is the “maximum ideal number” given?

p. 13, l. col., pa. 2: number => numbers
p. 13, l. col., pa. 5: excepting => except
p. 13, r. col., pa. 1: number => numbers

p. 14, l. col., in Fig. 14 caption: N. => Number
p. 14, r. col., pa. 2: WEb => Web
p. 14, r. col., pa. 4: RDF dataset => RDF datasets

p. 15, l. col., pa. 2: RDF, into => RDF into
p. 15, l. col., pa. 2: continuoation => continuation
p. 15, l. col., pa. 4: takes => take

p. 16, l. col., pa. 3: citation for “SLD Revolution”
p. 16, l. col., pa. 4: provides => provide
p. 16, l. col., pa. 4: synchronous => asynchronous
p. 16, l. col., pa. 4: patterns => pattern
p. 16, l. col., pa. 4, in reference [5]: W3c => W3C
p. 17, in reference [29]: delete ISSN and doi

Review #3
By Daniel de Leng submitted on 26/Mar/2018
Minor Revision
Review Comment:

The article proposes a system/tool for decentralised messaging for RDF stream processing on the Web. Since this article was submitted as a "Reports on tools and systems" article, we need to consider the "quality, importance, and impact of the described tool or system", as well as the "clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool." I conclude my review with an evaluation of the work as a whole, which questions the choice of article type given its content and suggests adjusting it to a full paper instead.

First, I consider the quality, importance and impact criteria. The article makes a compelling argument that despite the progress made towards RDF Stream Processing (RSP), many of the solutions thus far have not really considered decentralised messaging between RSP engines in the Web context. The article makes clear that this has been a known and accepted problem, and provides an actor-based proposal to support this Web-based decentralised messaging support. This shows importance of the proposed system. The proposal has subsequently been implemented in an open-source framework which the article links to. Crucially, this framework supports a number of concrete pre-existing RSP engines as a proof-of-concept, and experimentally tests the framework in Section 6. It is my opinion that this shows the high quality of the proposed system through its maturity. The final criterion of the system's impact is problematic, however. The proposed system is still very new, and the author seems to acknowledge this by considering the *potential* impact of the work I agree with the assessment made in Section 8, but the SWJ journal defines impact as "the *demonstrable* uptake of [the] work by the research community, industry, governments, or the general public" (FAQ #20). Neither the article (through references to applications of the system) nor the GitHub repository (through on-going development, forks, pull requests or issue tickets) shows *demonstrable* impact.

Second, I consider the clarity, illustration, and readability criteria. Overall, the article is well-written and easy to follow, with plenty of examples and clarifying figures to ease the reader into the topic. Consequently, I only have minor comments on parts of the article.

While a discussion of related work is provided, given the similarity between the actor-based approaches of this article and our DyKnow stream reasoning framework I would have expected a reference as part of a discussion of related work. DyKnow has used an actor-based model for stream reasoning since at least a decade, and this model has been refined over the years, most recently in a SIMPAR-2016 paper titled "DyKnow: A Dynamically Reconfigurable Stream Reasoning Framework as an Extension to the Robot Operating System" and an IROS-2017 paper titled "Towards Adaptive Semantic Subscriptions for Stream Reasoning in the Robot Operating System". One can find earlier work on stream refinements with knowledge processes in a JIFS-2006 article titled "A Knowledge Processing Middleware Framework and its Relation to the JDL Data Fusion Mode". Perhaps most closely related however is the FUSION-2014 paper titled "Towards On-Demand Semantic Event Processing for Stream Reasoning", which describes a wrapper for RSP engines into actors that can send streams to each other. The article presented here is a clear improvement over that paper and could be related as such. A key difference between the works is of course that the article is focused on RSP on the Web rather than the Robot Operating System.

In the experimentation section, Figures 11 and 12 looked a bit strange until I realised that the graphs/s metric is presented in a log-scale. This could be clarified. However, since you only have four data-points (i.e. for 1, 10, 100, and 1000 graphs/s), I would recommend using a bar-chart here instead. This would also be consistent with Figures 13-15.

I found a number of small typos, briefly listed below:

places special importance to => places special importance on

Eahc actor => Each actor (caption for Fig 6)
In consequence => Consequently
the it follows => it follows
one or all => one or multiple of / any of

can derive in different strategies => (unclear)
consumer my also request ... => consumer may also request ...
... a specific stream item to the receiver => ... a specific stream item from the receiver
actors need to have first => actors need to first have
we propose to constraint => we propose to constrain
As s result => As a result

LdnStreamRecevier =? LdnStreamReceiver
ActorStremaReceived => ActorStreamReceiver (caption for Fig 10)

WEb interfaces => Web interfaces

continuoation => continuation

While this is, in my opinion, a good original research paper, the choice of "Reports on tools and systems" seems to be its biggest problem. Articles of this particular type require demonstrable impact. This article fails on that criterion. Instead, the article would be much more suitable as a full paper after minor revisions to mitigate those minor problems pointed out in this review---related work, minor typos, and the presentation of some of the graphs. If the author wishes to keep this article as a tools and systems report, a major revision is likely required to show demonstrable impact.

Review #4
Anonymous submitted on 29/Mar/2018
Minor Revision
Review Comment:


The paper addresses the yet open challenge of publishing and exchanging streamed (linked) data. The streams could either be produced directly by a tool for triple transformation such as TripleWave or by a stream processing/reasoning engine. The author identified correctly that for static linked data there is good coverage of recommendations, best practices, etc. However, this is missing for RDF streaming data, which is due to its nature of (a) being time-dependent and fluctuating, (b) has several forms of entailment starting from lightweight RDF(S)-based to heavyweight Description Logics or LP-based.
The authors' starting point is the Actor Model, one possible implementation of concurrent systems, which is a lightweight model based on the idea of an actor (a computational entity) that acts (based on internal states) on received messages by either sending messages to other actors, creating new actors, and/or define a (computational) behavior for the next messages.
First, they lift the server-centric RDF stream processing (RSP) paradigm to the actor model by introducing Stream Receivers (SRE), Senders (SSD), Consumers (SCO), and introducing two types of messages: (a) RDF stream elements (RDF graphs) (b) RDF stream metadata.
For the messages, they define the format (RDF), suggest possible vocabularies like PROV, define message resolvability (messages are not resolvable), and highlight that messages delivery should be push or pull based. Further, they correctly see that querying (including a query language) is central element of RSP.
Then, they define three types of actors SRE, SSD, SCO and specify each of them. The SRE is the central point for handling messages such as "SendStreamItem" and "RetrieveStreamItem". The SSDs should support the publication of streams and their data by allowing operations such as "postStreamItem", and the SCOs allows the consumption of streaming data by "getStreamItem".
As a communication protocol, they chose Linked Data Notifications (LDN), which seems well suited for enabling decentralized messaging in a Linked Data/RDF setting. They transfer the concepts of LDN to RSP actors (e.g., IRIs for streams), and adapt the definition of an "LDM inbox" into input/output inboxes representing input/output streams. They further extend LDN with pushing stream elements (pulling can be performed by HTTP GET). They also suggest possible protocols (e.g., Server-Sent Events), but miss to compare different protocols that might be suited for pushed data on the web.
To show the applicability of their model by implementing the RSP actor model using Scala and the Akka library. Besides providing the different interfaces (called traits) of the SRE, they also have developed a stream receiver for the CQELS, C-SPARQL, and TrOWL engine, as well as a CQELS client to pose SPARQL queries.
As a bonus, they test their implementation and provide some experiments, which show that a set of parallel running RSP engines can handle large numbers of RSP senders (up to 16000) while keeping a good efficiency rate.

The main contributions of the paper are:
- We believe that this will be the first push towards an unified RDF stream messaging standard/protocol;
- The authors identify the need for an approach for streamed RDF, which is yet neglected by the "linked data" community;
- Based on the Actor Model, they design a message based approach, wherein stream engines and stream consumers could interact with each other in an unified manner;
- They define and work out the concept of an RDF stream receiver, sender, and consumer, which only communicate via messages channeled through the receiver;
- As an initial but promising communication protocol, they choose LDN, and adapt it to the stream setting;
- They already have provided an implementation for CQELS, C-SPARQL, and TrOWL, and provide initial experiments with CQELS. Also nice, that all the code is open source and available on Github.

The topic of the paper and the main ideas are appealing and important for the RSP community. We believe that this paper is well suited for the journal, if the minor points below could be adressed in the final version:

(1) The authors do not address the case of message ordering and composition. So messages might have to be processed in a certain order (like in a pipeline), where additional content/information is added based on the previous step. Maybe the SRE can be extended to allow such a composition, maybe some kind of transaction would be needed, or does this violate the Actor Model principle?

(2) Algorithm 1 seems not the be an algorithm, just a switch case (but this is disputable). We believe an algorithm handling the stream receiver should also include aspects of caching, order of message handling (FIFO, LIFO, priorities, etc.), and error handling.

(3) Section 3.1: The description of the vocabulary for messages is very vague, maybe the base messages could be defined clearer, which would still leave room to extend them if needed.

(4) Section 4: The protocol should have an option to deal with fading data, either by notifying the consumers or having a fading parameter in the stream metadata.

(5) The authors should describe the connection to the Web of Things recommendation (, can elements (like Thing Description) be incorporated into the SRE messages?


The originality is given, since the authors combine the Actor Model with LDN and RSP, which has not been considered previously.


This work seems very relevant for the RSP community, as it suggest a novel approach of decentralized messaging between RSP tools. Further, by providing an implementation they pave the way for tool developer to write their own Actor using the provided interfaces.


The paper is well written, clearly structured, and straightforward to read. The authors provide enough examples and figures, so the reader is has a good support in understanding their ideas. The are only a few typos (listed below).


- Stream Receiver is a bit confusing, maybe "Stream Dispatcher" is a better name
- Algorithm 1: Line 1, bracket is missing
- Related Work: WEb interface -> Web interface
- Related Work: Link 12 is not working