The Entity Registry System: Collaborative Editing of Entity Data in Poorly Connected Environments

Tracking #: 747-1957

Authors: 
Christophe Guéret
Philippe Cudre-Mauroux

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
There are about 4.5 billion people in the world who have no or limited Internet access. Those are deprived from using entity-driven applications that assume data repositories and entity resolution services are always available. In this paper, we discuss the need for a new architecture for entity registries. We take the concrete case of sharing, in an ad-hoc context, privacy-sensitive data stored in the educative software ''Sugar''. We propose and evaluate a new general-purpose Entity Registry System (ERS) supporting collaborative editing and deployment in poorly-connected or ad-hoc environments. The reference open-source implementation is evaluated for scalability and data-sharing capabilities.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 26/Aug/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
-1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
4 good
== 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
3 fair
== 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
3 fair
== 2 poor
== 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
4 good
== 3 fair
== 2 poor
== 1 very poor

Review
Please provide your textual review here.

The paper proposes an approach for managing and sharing privacy-sensitive data in poorly connected environments with the Entity Registry System.
The paper is well written and clearly structured. The ERS is an interesting, technically sound solution for managing entity descriptions in a decentralized manner. It is however not the original contribution of this paper and has been published before. The contribution of the paper is its application to managing journal data in education software.
Here, my impression is that the main concepts (ERS, privacy-sensitive data, poorly connected environments) simply do not fit together properly.
- The motivation why to manage journal data in an entity registry system is not convincing
- It is not properly shown why and how an ERS based on the described architecture would be better suited for poorly connected environments than, say a centralized approach where clients connect to and synch with a server whenever they are online

Overall, the entire approach appears to be artificially constructed following the “hammer looking for a nail” paradigm. The ERS as such is a valid contribution, but its application is not convincing.

Review #2
Anonymous submitted on 02/Sep/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== -2 reject

Reviewer's confidence
Select your choice from the options below and write its number below.

== 4 (high)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 2 poor

Novelty
Select your choice from the options below and write its number below.

== 3 fair

Technical quality
Select your choice from the options below and write its number below.

== 2 poor

Evaluation
Select your choice from the options below and write its number below.
== 2 poor

Clarity and presentation
Select your choice from the options below and write its number below.
== 2 poor

Review

The authors present a distributed data management infrastructure for storing entity information. The work is an extension of the authors' previous work on building a centralized entity registry system.

The paper suffers from two major issues. The first and overriding issue is the motivation and relevance of this work. The system presented is in no way connected to the (Semantic) Web but it is a private data infrastructure. It is also unclear to what extent if at all the system exploits the semantics of the data. The overall design of the system indicates that entire documents of entity descriptions are exchanged between nodes and hence the system might as well be conceived as a distributed document store. The management of information as triples only seems to be relevant for the so-called Aggregator nodes where information is managed at the triple level. However, it remains unclear what the connection is between the two data models, for example in Sec 4.2. it says that there can be no two triples that share the same predicate and object, but this is in direct contradiction with Sec 3.1. where it is stated that "potential conflicting statements are found in different containers". If the system as a whole can have conflicting statements, how does the Aggregator make sure that it remains consistent? Further, Aggregator nodes don't seem to be participating in the system other than consuming information.

The second major issue is the lack of proper evaluation. The authors fail to discuss and compare to any alternatives for distributed document storage. Lacking any baselines, the scale-up experiments are meaningless because they are only a function of available hardware. From the description it also seems that the distributed system was not tested as such, only as individual components (Contributors/Bridges/Aggregators). The scaling of these individual components doesn't mean that the system as a whole would scale. There is also a large disconnect between the motivation ("4.5 billion people have no or limited internet access" or even stronger "deprived" from internet) and the experiments, which in no way prove that the system would be more robust under unreliable or inexistent (?) network connectivity than any alternative.

Other questions/comments:

-- Linked (Open) Data is depicted as a centralized system which is very strange: anyone on the Web can make any statement about any entity without a need for a centralized registry.
-- The name Entity Registry System seems to be confusing. What is the registry: the system as a whole? What does it mean to register an entity?
-- What is the practical purpose of the Aggregators? I understand that they are there to gather information, but how does one query information from an Aggregator? Is there a SPARQL query interface? Keyword search? Does it support streaming queries?

Review #3
Anonymous submitted on 11/Sep/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
XX == 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
XX == 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
XX == 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
XX == 3 fair
== 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
XX == 3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
XX == 3 fair
== 2 poor
== 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
XX == 4 good
== 3 fair
== 2 poor
== 1 very poor

Review

The paper presents the usage of an Entity Registry system to get help get user-centric statistics out of the Sugar e-learning platform, with a focus on low-connectivity environments. Overall, the problem as stated is already a solved problem using basic encryption how to transmit this data in a manner that only authorized people can read it, and PETS (Privacy-Enhancing Technologies) already has work (see Danesiz et al.) on how to do statistics in a privacy preserving manner. It's also really unclear how the Entity Registry System would do anything to help user-privacy.

However, the authors do build a Linked Data "entity registry" regardless. The RDF part of their solution, as the constraints forbid the usage of real RDF, and the use of URNs rather than http URIs makes the work non-lined data compliant. While work has been done on the evaluation data-set, the data consists of the repetition of a 73 item data set over 100 times, and thus is not realistic. There are some interesting points made about scalability and registries, but they are done within such a contrived context that is difficult to tell what they are.