Semantics and Provenance for Accountable Smart City Applications

Tracking #: 927-2138

Authors: 
Heather Packer
Dimitris Diochnos
Michael Rovatsos
Ya’akov Gal
Luc Moreau

Responsible editor: 
Guest Editors Smart Cities 2014

Submission type: 
Full Paper
Abstract: 
The recent media focus on Smart City services, particularly ride sharing, that provide ordinary users with the ability to advertise their resources has highlighted society's need for transparent and accountable systems. Current systems offer little transparency behind their processes that claim to provide accountability to and for their users. To address such a concern, some applications provide a static, textual description of the automated algorithms used, with a view to promote transparency. However, this is not sufficient to inform users exactly how information is derived. These descriptions can be enhanced by explaining the actual execution of the algorithm, the data it operated on, and the parameters it was configured with. Such descriptions about a system's execution and its information flow can be expressed using PROV, a standardised provenance data model. However, given its generic and domain-agnostic nature, PROV only provides limited information about the relationship between provenance elements. Combined with semantic information, a PROV instance becomes a rich resource, which can be exploited to provide users with understandable accounts of automated processes, thereby promoting transparency and accountability. Thus, this paper contributes, a vocabulary for Smart City resource sharing applications, an architecture for accountable systems, and a set of use cases that demonstrate and quantify how the semantics enrich an account in a ride share scenario.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Paolo Missier submitted on 14/Dec/2014
Suggestion:
Major Revision
Review Comment:

The paper presents a framework to support transparency and accountability for "smart sharing" services, which are becoming one of the features of so-called "smart cities" (think airbnb, carshare, etc.).
The idea of leveraging the provenance of the sharing services to support their accountability (which the authors call "accountability as a service") is interesting. In terms of modelling, the authors embrace a prevalent view of provenance as a structured and domain-agnostic form of metadata that facilitates the organization of further domain (semantic) annotations, i.e., by provenance markup. So the premises are there for an interesting set of contributions. These include

- a domain vocabulary to describe the provenance of smart sharing services
- a framework to support a "reputation service" and an "explanation service", with examples.

These are indeed described in good detail, however their nature appears to be quite straightforward and there is limited technical depth in the actual contributions.

Regarding the vocabulary, it is undeniably going to be useful, albeit only to the specific community for which it was designed. Its design is a straightforward exercise, not necessarily a bad thing, however terms are defined as a matter-of-fact (in sec. 3), with little insight into how they were chosen. Is it based on a corpus of use cases? where do these come from? how many? what is their coverage of the domain? And has the vocabulary been tested on any such use case? how? do you expect it will need extending?

Regarding the services, there a a few of them.
The reputation service is quite straightforward, it just aggregates over "feedback reports". The examples suggest that these are well-formatted and well-behaved, but it is not clear where this format comes from. I could not find a reference to a standard exchange format, which suggests that these may be homegrown, "domesticated" examples thus raising the question of whether the framework has been tested "in the wild" on any of the popular services listed in the introduction. See below for more on the issue of validation.

Recommendation: address integration issues with third party formats and services.

The explanation service is more interesting. Here a technique based on matching narrative templates to provenance fragments is described in detail, in fact all templates are fully laid out in tables. However the matching is fairly simple, and I am not sure there are any research challenges associated with this idea? This may be fair enough, but then are these narratives useful? The main case study (RideShare, sec. 5) elucidates the use of this technique in detail (incidentally, in architectural terms I think it's a case study and not a use case -- It does not seem to contribute requirements or help with their validation). Narrative generation is clear. The point is one of validation. Since these narratives are naturally targeted at users (in this case users of the RideShare service, presumably), one expects a user validation of their explanatory function. Are these actually useful? at the right level of abstraction? or do these require further manipulation (more in the realm of NLP and generation, and thus presumably more challenging) to be brought to fruition?
It is odd to be presenting an accountability meta-service with no indication of whether users find that it helps making the underlying services more accountable.

Bottom line: add evaluation.

On architectural grounds, it would also be interesting to get a feel for the effort required to apply the framework to services like RideShare. In this example, everything "works" because the domain services produce complete provenance and are also fully annotated (Table 10 does show the narratives in the case the RideShare vocabulary is not used, however one must still assume that full provenance graphs are generated). Who "owns" RideShare and how much cooperation is required to instrument its services?

Bottom line: address provenance instrumentation, semantic annotation issues "in the wild".

Regarding presentation, I have two main recommendations.
Firstly, separate the presentation of the concepts, services, techniques from the much more extensional information that is found in the many tables, and which takes up plenty of space. The paper is long and reads as a hybrid between a research contribution and a specification document. I think it would benefit from having a core "researchy" narrative with few examples and the rest put in Appendix.

Secondly, before submitting a paper please proofread it. Trivial as it sounds, the many, many typos, grammar issues and a few broken sentences throughout the paper become irritating very quickly and point to a rather careless preparation process (ex.: "Furthermore, given that the sentence template vari- ables are placeholders for resources to be extracted from the provenance, and quantify the number of re- sources that each narrative exposes.")

Minor:
- the graphs in fig. 5,6 are pretty much unreadable when the manuscript is printed out.
- ref [20] incomplete and probably submsumed by [21]? ref [22], [27] venue missing

Review #2
By Daniel Garijo submitted on 15/Dec/2014
Suggestion:
Minor Revision
Review Comment:

This paper defines an extension of the PROV standard for modeling the provenance smart city applications, making them accountable. The paper exploits the usage of the vocabulary extension proposing a) REST API that implements a reputation service; and b) an explanation service which aims at providing human readable explanations for the terms captured by the vocabulary. Both functionalities are explained with examples.

The paper is well written and very relevant to the call (having semantics, ontologies, semantics for citizens and provenance). I have found some typos though, which I highlight at the end of my review. I also consider the work novel, as it is the first work (that I am aware of) that creates human readable representations from provenance statements. However, I have some concerns with the paper, which I list below. If the authors address my issues, I'll be happy to accept the paper as part of this special issue.

1) The paper makes claims at certain points that are not true, as there is no evaluation. For example "this paper proposes... data elements in a narrative form so it can be EASILY DIGGESTED by users", "...state of the art frameworks tend to focus on logging this information, but do not present in an EASY TO UNDERSTAND FORMAT TO USERS" (this last one is an implicit claim). If a user evaluation showing that the proposed narratives produce explanations that are more useful for users than plain statements is not provided, then any statement similar to the ones I have highlighted should be modified in the manuscript.

That said, I think that such evaluation would make this paper very strong. And I am particularly curious on whether users would prefer this kind of representation versus a table with a set of selected property values, which sometimes is easier to navigate rather a full paragraph.

2) Where are the requirements for the narrative? I found some general requirements in Section 4, but I don't understand why a particular narrative was chosen, and who validated it. Was it validated just by the authors or someone else? For example, for the PROV narrative there are statements that seem unnecessary (e.g., stating that a resource is an entity may not add much value to some users). Maybe it would be enough saying that a resource was derived from another, etc. Different users might have different needs at different granularity. Please discuss this in the paper.

3) There is a missing discussion on how the vocabulary and templates can be extended (just briefly in the conclusions). How difficult would be for anyone trying to adapt this approach to reuse it? Has the prov narrative been used in other contexts?

4) The definition and semantics of the vocabulary is confusing. The authors referr to the model used as a vocabulary. Is this an RDF vocabulary/ontology? I haven't found barely any descriptions at all in http://smartsociety-project.github.io/cas/. It looks like an ongoing documentation (incomplete at the moment).

There seem to be some inconsistencies in the naming: a plan is a prov:Entity but not a prov:Plan? sending_request is an activity, but sending_negotation_response is an entity? Some other activities have a "_activity" at the end. Please use a consistent naming scheme.

I am also a bit confused on the "roles" in the definition of the ride share vocabulary. PROV introduces the notion of roles, which are not types of agents. Are the authors mixing these notions? If a driver is not a role played during an activity, the please avoid using that word in its definition. It can be confused with the notion of role in prov.

5) In the related work the authors say that some related work influences the Smart City vocabulary. How does this happen? I haven't seen any reference when they introduce the approach. Also, last paragraph just enumerates other approaches, but doesn't differentiate them from the current work. Why aren't any of these reused? A model can expand PROV and other existent ontologies as well...

6) I haven't been able to see the figures 5, 6 and 7. With the printed copy is not enough (too small), and the prov store returned a 404 for all the 3 urls.

Minor concerns:
1) Some resources are not available: the images I mentioned above, the REST API (for at least testing the proposed approach) and the documentation of the model.

2) Examples are often not explained or referenced in the text. If an example is added please explain it briefly in the text.

3) The tables require a subject, but the sentences assume that all the provenance information is available. What would happen if the provenance information would be partially missing? Would the sentence only appear in part?

4) Some of the produced paragraphs are quite verbose. Maybe the verbosity would be reduced if the labels of the resources were shown instead of their ids. I wonder what would happen if the ids were urls, probably the text would become very difficult to read. Another question related to that is why not have identifiers/uris that are resolvable instead of having to create an additional GET operation for the ids in the REST API? It would be more simple and compliant with the semantic web principles.

5) I would not use the wikipedia definition for "smart city". Haven't the authors found a better one?

6) Some of the urls used across the in the footnotes are very long. I suggest shortening them with https://goo.gl/ (for example)

7) I found the second paragraph of section 3 quite confusing and verbose. I think it would benefit from some rewording by the authors.

8) The term "accessible account" is never introduced before section 4. I think that I understand what the authors mean, but I would appreciate it if they defined it first.

Typos:
I have found these typos/suggestions. I recommend the authors doing a proof read before resubmitting the paper for the next round of reviews.
1 Introduction:
"they mediate access to real people" -> wouldn't "human users" or something like that would be more appropriate?
"for unlicensed drivers.". -> extra full stop
"reputation systems"-> missing full stop.
References 19 and 20 are used for defining "provenance". Use provenance[19,20] or remove one.
"Prov is a W3C standard [...], which can be a PROV entity, activity or agent. However, [...]" -> I don't understand the use of "however" here.
2 Background work:
"It is well suited to describing provenance data"-> describe (or for describing).
". While, the work presented in..." -> isn't Meanwhile more appropriate here?
3 Smart City Vocabulary
"The type information allows..." -> The type of the information allows...
"facilitating the leverage the information in the provenance..." -> facilitating leveraging
"a Smart City application needs to be able to describes..." ->describe
4 Accountability as a service
"It is important that order to support this" -> in order to support this.
5.1 Feedback and reputation reports
"There will be an investigate" ->there will be an investigation
5.2 Ride Share Vocabulary
"when a feedback is submit" -> when a feedback is submitted.
6 Ride Share Accountability Use Cases
"the explanation peer to generate two narratives" -> the explanation peer generates two narratives.
Table 10: The first sentence goes into the second column
7 Conclusions
"board class of applications" -> broad class
"Firth, investigate..." -> First, investigate.
"json-DL" ->JSON-LD.

Review #3
By Alejandro Llaves submitted on 08/Jul/2015
Suggestion:
Reject
Review Comment:

The paper describes a framework to provide provenance metadata about accountability in Smart City applications. A ride sharing scenario is used to demonstrate the application of the paper contributions. Such contributions include a provenance vocabulary to support accountability and a stack of Web services that implement "Accountability as a Service".

ORIGINALITY
The authors claim that this work is the first that uses provenance, semantics and narrative explanations together to describe how data is handled by users and services. I have seen similar mechanisms to generate class and property descriptions (experimental paraphrase) automatically for the Semantic Sensor Network ontology - http://www.w3.org/2005/Incubator/ssn/ssnx/ssn. Bringing this concept to report on the provenance of accounts of actions performed by applications in a Smart City context is an original idea and has a lot of potential.

SIGNIFICANCE OF THE RESULTS
Although the idea of the narrative explanations for automated processes is good, I do not clearly see the role of semantics in this work. The presented Smart City vocabulary, documentation available at http://smartsociety-project.github.io/cas/, is a collection of terms that describe agents, activities, and entities using text. However, I could not find the vocabulary specification at the link http://purl.org/cas/ns#. Therefore, the URIs for the concepts of the Smart City vocabulary are not published dereferenceable, e.g. http://purl.org/cas/ns#Activity or http://purl.org/cas/ns#Agent.

The Accountability as a Service stack proposed at figure 1 makes sense to me. Yet, I do not find how the use of semantics helps in this case:
- The Reputation Peer generates reputation reports from feedback reports, which follow a JSON structure (figures 2 and 3). Where is the feedback and reputation model taken from? Which are the semantics of the model? I would encourage the use of JSON-LD for these reports in order to have a semantic grounding, e.g. to describe what is a subject. Maybe these reports could be enriched using the Smart City and PROV vocabularies.
- The Explanation Peer uses sentence templates to describe the provenance data depending on the type of element being described. The resulting descriptions are human-readable, but what about machine-to-machine interoperability?

Sections 5 and 6 present a ride sharing escenario to demonstrate the benefits of the paper contributions. In this case, feedback reports are about riders and drivers, and such reports are used to generate user reputation reports. Accountability for ride plans and the user negotiation process complete the three use cases derived from the ride sharing escenario. Here, authors propose a new vocabulary that describes activities related to ride sharing (again as a collection of terms with no defined properties) and extending the previous Smart City vocabulary, but I could not find a link in the paper pointing to where it is published. New sentence templates that include terms of the Ride Share vocabulary are also presented. The main point of the authors' analysis to show the benefits of their work is that the narratives enriched with the Ride Share vocabulary terms are more descriptive than those ones only using the PROV vocabulary. Indeed, the former use an application-specific sentence templates whereas the latter do not, so I would be very surprised if the PROV narratives were more descriptives.

The paper does not include an evaluation to compare the presented framework with similar ones. IMO, the presented results are not significant enough to be published in this special issue covering the role of semantics in Smart City applications.

QUALITY OF WRITING
There are parts of the paper that are well written, but there are others that include some typos and grammar errors, e.g. first two paragraphs of section 3 or the conclusion. There also missing periods and misuse of semicolons in listings. Additionally, the writing style is not homogeneous. A deep revision of the text is highly suggested.

IMO, the paper does not require eleven tables. For the narrative templates (tables 2 and 7), some examples could be described and point to the full list of templates with a link. Table 6 does not help to show the hierarchy of the Ride Share vocabulary. In section 6.1, two narratives are written and then repeated sentence by sentence in table 8. From table 11, only the number of exploited resources is relevant.

OTHER REMARKS
- The title is too ambiguous.
- There should be a better source for the definition of Smart City than the wikipedia one.
- Too many footnotes including links in page 2.
- Section 2, 3rd paragraph: When a quote from other source is used, the citation should be attached next to it.
- The description of the terms in the vocabularies is sometimes ambiguous, e.g. "An agent is anything that can perform an activity; alternatively, anything that has capabilities" or "An activity is the condition in which things are happening or being done".
- The authors often write about submitting provenance to ProvStore, whereas they refer to submitting provenance (meta)data or data about provenance.
- Figure 5 and 6 are not readable in the printed version of the paper, and figure 7 barely is. Caption of Figure 7 exceeds the page limits.
- Some references do not have publication details, such as [20] and [22].