Sequential Linked Data: the state of affairs

Tracking #: 2471-3685

Authors: 
Enrico Daga
Albert Meroño-Peñuela

Responsible editor: 
Guest Editors Web of Data 2020

Submission type: 
Full Paper
Abstract: 
A sequence is a useful representation of many real-world phenomena, such as co-authors, recipes, timelines, and media. Consequently, sequences are among the most important data structures in computer science. In the Semantic Web, however, little attention has been given to Sequential Linked Data. In previous work, we have shown the data models that Knowledge Graphs commonly use for representing sequences; that these models have an impact in query performance; and that this impact is invariant to specific triplestore implementations. However, the specific list operations that management of Sequential Linked Data requires, beyond the simple retrieval of an entire list or a range of its elements, remain unclear. Besides, the impact of the different models in data management operations remains unexplored. Consequently, there is a knowledge gap on how to implement a real Semantic Web list Application Programming Interface (API) that enables standard list manipulation and generalizes beyond specific data models. In order to address these challenges, here we build on our previous work and propose a set of read-write Semantic Web list operations in SPARQL, towards the realization of such an API. We identify five classic list-based computer science sequential data structures (\textit{linked list}, \textit{double linked list}, \textit{stack}, \textit{queue}, and \textit{array}), from which we derive nine atomic read-write operations for Semantic Web lists. We propose a SPARQL implementation of these operations with five typical RDF data models and compare their performance by executing them against six increasing dataset sizes and four different triplestores. In light of our results, we discuss the feasibility of our devised API and reflect on the state of affairs of Sequential Linked Data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By David Chaves-Fraga submitted on 05/Jun/2020
Suggestion:
Major Revision
Review Comment:

In this paper, the authors present a set of improvements and new contributions based on two previous works about sequential linked data management. More in detail, they define a set of atomic read-write operations for semantic web lists over 5 data common data structures (linked/double list, stack, queue and array) with extensions on a previous proposal of a benchmark for this kind of data and the corresponding evaluation over 4 different state-of-the-art triples stores. As it is submitted as a full paper for the special issue "Storing, Querying, and Benchmarking the Web of Data" (where it fits perfect) I will evaluate it over these requirements together with their strong and weak points.

(originality)
Although the proposal is simple and based on basic operations already proposed many years ago for managing other data structures, for the best of my knowledge it wasn’t tackled before in semantic web and covers a gap in the state of the art. I definitely agree on the motivation of the work, lists should be one of the most important data structures for RDF as it is demonstrated that they help to practitioners and developers to improve and standardize data management. However, the related work is very focused on Semantic Web proposals, I was expecting to read more about how these data structures were proposed in other fields and also how they were evaluated together with the main differences when the data source is in RDF and SPARQL is the query language. It’s also not clear to the reader what is the main contribution of the paper, as in the abstract is mentioned that is the atomic operations, but then, reading the paper, one has the feeling that the authors wanted to tackle two different problems: operations and data structures for Lists in RDF and propose an evaluation benchmark for them. Both of them are not easy tasks, and in my opinion, they have to be described in different contributions (as the authors did in the past) and not mixing them because their objectives are totally different.

(significance of the results)
I will detail my comments based on the contributions defined in the introduction.

1- Atomic Operations (Sec 4): The section starts very well with the definitions of the functions for the (double) linked list and a good explanation of their meaning. Some examples would be really useful to the reader to understand the potential of the functions and how they work. In stack section, what means that first operation is “often” available? What are the main differences between list L and list S? That should be formalised. I liked the description of it as a LIFO list, improves the understandability. The same questions appear for Queue, what is the difference between L/S/Q lists? What is the final results of first(append(v,append(w,Q)))= first(append(w,Q)) is Q=[] or is a non-empty list? The function remove is nor defined neither explained and is new in this section. Finally, for array, why L is used here? Has the same features of L in (double) linked list? More detail in the explanation of the functions is needed and in general, more examples for all the proposal would help.

2-Catalogue on RDF List: As I will mention below more in detail, I don’t see where is the contribution here and I understood this more like a preliminary section rather than a contribution. Additionally, I do not think that what authors called URI-based lists should be considered as a model for RDF lists. Although it has been used in some proposals, it needs a lot of domain knowledge and to understand how the URIs were created by the potential users that create the queries. Use implicit knowledge from the URI to define a list should be understood such as not a very good modelling decision and not a recommendation as the authors already said at the end of section 8.

3-Updated benchmark: My main concern with this contribution is that the section is not self-contained and the reader must go to the previous work of the authors to understand the work description. The usage section can be moved to the Github repository as it isn’t a scientific contribution. More detail on how the queries were defined and how the relevant parameters form RDF list model that affect the behaviour of the triple stores are covering would be recommendable. Maybe moving the queries from section 8 to section 6 with a detailed explanation will help the reader to understand the benchmark proposal.

4-Experimental evaluation&discussion: I think it is a very good contribution and the paper reflects all the work that the authors made for evaluating the proposal. The discussion is interesting and definitively demonstrate that this is only the first seed for the future of sequential data structures in RDF. I don’t totally agree on the conclusions that not recommend the use of rdf:List and SOP due to their performance problems. In my opinion, features such as expressivity, be an “elegant” solution, etc. should be taken into account to do this type of recommendations. I would recommend to the authors to be a bit more ambitious on their conclusion, could be now the time for starting to think in the incorporation of these operations in the SPARQL specification? This will open other lines of researching in terms of performance and optimizations processes, for example, having the possibility to push down the operations to the underlying data structures of each triple store and also avoiding the complexity of the current SPARQL queries shown in the paper.

(quality of writing)
Focused on the data structure of the paper, this is one of its main weak points. The paper is a bit difficult to follow, with many independent sections that do not follow a clear story. For example, it is not clear to me what is exactly the main objective of section 5, I understood it as a description of the main lists models that we can find in RDF but is this not part of related work/state of the art or at least preliminaries? First paragraph of 5.1 is basically the same as the second paragraph of the related work section. I missed clear examples of how the operations proposed in section 4 will be executed over the different RDF models. Additionally, as I mentioned before, section 6 describes the benchmark providing information that is not relevant for a scientific paper (e.g., how to run it), I would expect to see what are the main differences between the previous proposal and the current one. The paper also should be self-contained, for example, what is the meaning of prop_number property now and what was before? Tables such as Table 2 and 3 should be improved to provide more useful information. I understood that information about the machine where the experiments were run it is important in order to ensure the reproducibility, but insert a big “figure” with information that is already in the text is not very recommendable. About the presentation of the experiments, what is the aim of presenting the same results in 3 different forms (2 plots and 1 table)? The use of the same colours in the two charts (performance/scalability) representing different features can confuse the reader. What are the variables you took into account to generate Tables 4 and 5? What is the meaning of the cells in Table 5 (Y/N/M)?

In summary, I think the paper presents really nice contributions and clearly a lot of work has been done, but its presentation should be considerably improved.

Review #2
By Johannes Frey submitted on 11/Jun/2020
Suggestion:
Major Revision
Review Comment:

In the full paper with the title “Sequential Linked Data: the state of affairs” the authors carry out a performance study for 5 list/sequential data models in RDF using 4 different triple stores. The topic is a good fit for the special issue. The language is very good and comprehensible. The paper is structured well and follows a clear methodology. The methodology, the model survey, the related work section and the used LIST.MID benchmark are reused from previous works, but as a consequence the paper is very well self-contained. The novel contributions are a formalisation of 9 atomic read or write operations for sequences (e.g. first, rest), the definition of new queries in the benchmark w.r.t. these operations and the performance study (varying along 4 dimensions: operations, 5 sequence sizes, database system, sequence model). The authors present detailed performance results (average response time for 10 runs) for every combination and summarize them in a comprehensive matrix along the dimensions operations & sequence models using 5 likert categories.

However, the observed behaviour and differences of the individual combinations represented in the individual plots and papers are not described on a sufficient level of depth for a journal paper, especially when considering the fact that there are 15 pages of text versus 18 pages of plots and tables. Huge parts of the analysis and comparison of the individual plots are left to the reader.

Unfortunately, the design & realisation of the benchmark have flaws, and the setup & execution of the experiments raises questions. The experiment “datasets” contain only one sequence at a time which seems not a realistic benchmarking scenario to compare different sequence models when having a sequential API in mind. The queries are fixed (no templating with regard to different sequences but also a fixed index positions for get, set and remove was chosen in questionable way). The experiments are performed on the same database instance for the different sequence models.

Although it seems a-priori reasonable to drive the experiment design and queries by requirements from the mentioned sequential SPARQL-backed API use case, it is in question whether this is a relevant and meaningful use case in practice. SPARQL was designed as a query language to traverse and analyze graph structures. A pure emulation of a sequential data structure with SPARQL seems synthetic and not a pragmatic and efficient way to go from an engineering side. It would be natural to store the individual RDF entities (or pointers to it) as text in a database optimized for this type of requests and use this as backend for the API instead. The power and potential of SPARQL in the context of RDF sequences is its expressiveness allowing to combine access and manipulation of actual graph and sequential data in one query (e.g. Retrieve all albums from 2020 where the first track is shorter than a minute). While the presented results give an indicator how the sole atomic sequential data structure operations perform, it is uncertain whether the findings also hold for these combined queries and operation chains and whether they can be transferred to real life workloads. These types of queries should be addressed (ideally with different levels of computational complexity) in the study.

The mentioned flaws, questions and further (minor) issues are discussed in detail below.

I encourage the authors to revise the work with extended benchmark datasets and queries and an improved setup and analysis.

I suggest a major revision since the work in general is addressing an interesting research gap.

=================================================

Further Comments:

p4l25b (E x L)

p5l36b e_i should be E or the arrow should be = or the other “maps to” operator; what is L_i-1 and E_n, this formalisation seems to raise more questions than a textual explanation

p6 Table 1. I do not understand why operations are classified with regard to “relevancy” taking into account that all operations have been evaluated for all models; Table 1 seems not referenced?

Figure 1-5: it is not really clear how the lists / events are linked to the track. _x seems to be the list entity in Fig. 1-2 and the track entity in Fig. 3-4 and in Fig 5 it is missing completely, it should be consistent

p7l47a Section ??

p8l11-24b and 36-49b and command line details should be removed

p9l14a not “from the current one” but starting with the second one

p9l36a The decision to pick the second-half of the list introduces bias and seems inconvenient. It is obvious that there is more overhead to traverse to an element of the second half in a list, while for an index-based structure there is less overhead if an element is removed in the second half, while it is the other way around when the first half is chosen. Moreover, I don't understand the idea of a single fixed index position for a benchmark. This is usually handled via templates and a set of random numbers (assuming uniform distribution). These decisions without further justification weaken the trust in the measured results.

p9l17b Nevertheless, the proper setup is one database instance per combination (or at least per model if scalability is of minor interest) to ensure that there is no interference (e.g. all graphs can share same index structures and dictionaries); this also has the advantage that it is impossible to forget to restart the database when proceeding with the next combination, raising the question whether this has been performed by the authors.

p10 Table 3 shows a major weakness or design flaw of the benchmark: the “datasets” contain only one sequence each and are really small and easily fit into main memory. I am convinced that the average datasets hosted in SPARQL endpoints in real life are much larger (many sequences, millions of triples). Moreover, there is a typo at 500(k) files.

p10l41a-15b should be removed; The VMware Hypervisor is an indicator for a virtual server. If other virtualized systems are running on the physical hardware (e.g. cloud hosting) this could affect the benchmark results and therefore would be not suitable for benchmarking. This should be clarified or fixed.

p11l15a I do not understand the usage of “a midi:track” triple pattern since the entity id is provided, one could argue that this is not a “minimal query”.

p11l6b The query is confusing to me since the SPARQL result set is not of the same list type (RDF sequence in this case), I would not consider this as a function of L -> L as actually defined; chaining of this function would not be possible although this is the primary use case of the rest function

p11l11b an “offset 1” expression would be more intuitive than the filter statement and potentially more efficient and conforming to the “minimal/pure query” requirement. Having in mind that the performance may vary depending on the query realisation, it could be seen as a methodological problem of the experiment that the effect on the performance caused by different query realisations has not been studied.

p11l18b This is counterintuitive to me, I would expect the performance for the rest function to be very efficient for List which offers the rest structure already on storage level; having read the query, I assume that it is not the goal to evaluate how the rest function performs on the different RDF list data models, but how efficient the materialization of “the rest” of the elements from the model to a SPARQL result set is?? If that is the case, this seems not explained in a clear way (“The operation returns the content of the sequence”)

p12l26a I did not understand this given the information in the paper so far; according to Figure 5 SOP does not have a dedicated sequence entity where its members are directly connected to

p12l43a I do not agree. Unless negative index numbering is allowed, which would be surprising, append_front should lead to a huge workload due to rewriting all index numbers. For the uri-numbering this should have dramatic consequences since all entity ids change, leading to updates of triples apart from sequence management triples.

p12l14 I do not understand the difficulties. A zero-padding policy should not have any impact when removing an element (only when adding one! This should be discussed for append.). Moreover, It would be possible to evaluate with and without padding restrictions. Nevertheless, all index numbers have to be rewritten (decremented) anyway.

p12l19b It would make sense to introduce “surrogate” IRIs (linked via owl:sameAs to the original resources (events)) only dealing as alternative ids for participation in a sequence. This should eliminate the impact of entity-specific triples. Both URI-numbering approaches (with and without surrogates) could be compared to study the impact. This would also allow an entity to participate in multiple sequences at different positions, which seems a major drawback of the uri-numbering model that should be emphasized. Without surrogates this could be seen as comparing apples and pears.

p12l11-40 I was expecting an optimal similar performance for “list” ( because it seems easy to find and replace the end of a list) or at least a similar performance compared to SOP. After having a look at the query (involving variable length property paths) the performance results are not surprising anymore. Maybe it is worth mentioning to the reader that RDF list has no direct link to the end of the list. Nevertheless, this raises the question whether it is a fair or meaningful comparison between SOP and List, given the fact that within the SOP query the “hasEvent” link to every event is leveraged while this is not possible for the RDF list. Additionally it is worth mentioning that the SOP pattern actually recommends to use subproperties of sequence:precedes/follows (otherwise one entity could be only one member of one sop sequence). Moreover, the individual linking per event to its track seems not proposed in the referend sequence ontology design pattern. This is adding up to a bias in favor of SOP.

p13l14 The query is tailored to one very specific use case and is not representative for the SOP model in general, because the substring index is hardcoded. While this is possible for rdf:_* properties, this is not applicable for subject IRIs that can be of various (prefix) form and length. The only requirement is to follow a specific pattern which helps to identify/ extract the index number in a reliable and interoperable (between datasets and sequences) way. When considering the mentioned sequence API use case on top of a SPARQL endpoint, I wonder how such an API would look like if it needs to know the prefix length of the URIs in the collection or how the queries would look like if there are different types of entities contained in the same sequence naturally having different prefix lengths.

p13l47a I assume the wrong table is referenced. The caption of Table 5 is mentioning a different purpose. What do Y, M and N represent?

all bar charts: I think using box-plots (additionally including the average) or error-bar-plots (since 10 runs were performed each) would be better to study the significance and variance of the results between different sequence lengths.

p18 Fig.9 The fact that 2k and 3k sizes perform much better than 500 or 1k can be indicators that there is problem with the experiment setup. This might be caused due to caching and warmup of the database if it has not been restarted after every single run.

The colors for sop and list in the scalability view are hard to distinguish.

There is not much added value in the tables compared to the plots, the tables could be removed and presented on an experiment website (e.g. markdown in the github repo)

p30 caption should be “popoff”

Was a check for the correctness of the query results across different models and databases performed?

Review #3
By Luis-Daniel Ibáñez submitted on 13/Jun/2020
Suggestion:
Major Revision
Review Comment:

This paper is about defining a core-set of read-write atomic operations for lists, in order to advance towards the creation of a SPARQL-based API. The operations are implemented as SPARQL queries and benchmarked on the different RDF list data models on 4 different triple-store engines.
The paper is very similar in structure and methodology to the previous work of same authors referenced in [13], but it has a reasonable amount of new content.

I have no general concern about motivation and related work.

Methodology:

In the requirements step, authors claim to "systematically analyse classic computer science data structures", but there is no indication of how many and which textbooks or sources they reviewed to get the data types listed in section 4, and how they resolved any difference in definitions, if any, or chose between any overlapping ones. I am unsure if the authors do not mean instead "systematic review", which has a different meaning and appears to be what they need: get the core atomic list operations from different definitions. A systematic review has its own methodology, which is not followed here.

In the survey step (and by extension, section 5), it seems that everything is taken from [13]. However, "an updated survey and catalogue in RDF list model" is claimed as a contribution. I think this claim needs to be removed.

In the evaluation step. It is clear that the main metric is overall response time (though it should be defined here, what is the difference with "response time").
However, it is very unclear what the authors refer with the paragraph "database engines may perform differently, or experiments results differ depending on the nature of the modelling solutions, showing a trend independently from the actual implementation". What is a "trend", how is it defined and measured? Is this connected with the "triple-store invariance" hypothesis mentioned in the introduction?

It is also mentioned "Here we focus on the more complex relationship between models, operations, dataset sizes and triplestores". More complex than what?

For section 4, I already mentioned that there is no indication of how the methodology was applied to get with this list of data structures and the subsequent operations. Given that this is the part that answers the main research question of the paper as per the introduction, this would need to be revised. I would also encourage authors to add a RQ related to how the identified operations perform on current RDF list models, as most of the hard work of the paper seems to be on the benchmarking.

For section 5, already mentioned that this is exactly the same from [13] (even, minus the "timestamped-based lists"), so I don't see any contribution here.

Experimentation

- In [13] the track sizes are 1k,30k,60k,90k, and 120k, while in this paper are 500, 1k, 2k, 3k, 5k and 10k. Why the change from previous work?
- It is unclear if the prepared datasets overlap or not, which might be relevant as for each triplestore they are stored in the same instances under different named graphs.
- I am not convinced about the argument for concluding that "reported averages are significant and represent well the rewsponse" based on an SD size of less than 20% on a 10 run sample. From where the authors got this method? Or perhaps they could elaborate on the statistical argument used?. Side note: I consider the inclusion of something like this very positive, not many papers that benchmark response times think about this.
- The claim that "Results are coherent for all the queries and datasets and demonstrate clear trends across different database engines" is somewhat vague. I follow from my previous comment on what a "trend" exactly means, is it that they all correlate with a certain type function? Is it that you also observe the triple-store invariance hypothesis on these operations? If so, how this is verified?
- Figures 9 onwards are missing the measurement units on the axis.
- The observation that there is a "negative performance of Seq data model in combination with Virtuoso" seems to be against the triple-store invariance hypothesis, any explanation for this? This also appears to happen with prop_number on Blazegraph on Figure 15
- Figures 14 and 15 have the same caption
- The description of Tables 4 and 5 in the text does not corespond with the captions, they seem to be reversed.
- It is not clear how the correspondence between results and the Likert scale presented ion Table 4 was done. I would expect a range of response time values mapping to a 1-5 number.
- Same as above for table 5 and the Y,M,N. I guess this is what is meant by "fitness for use", but there is no definition for this term and why the scale is Y,M,N