Reasoning with Data Flows and Policy Propagation Rules

Tracking #: 1441-2653

Enrico Daga
Aldo Gangemi
Enrico Motta

Responsible editor: 
Guest Editors Linked Data Security Privacy Policy

Submission type: 
Full Paper
Data-oriented systems and applications are at the centre of current developments of the World Wide Web. In these scenarios, assessing what policies propagate from the licenses of data sources to the output of a given data-intensive system is an important problem. Both policies and data flows can be described with Semantic Web languages. Although it is possible to define Policy Propagation Rules (PPR) by associating policies to data flow steps, this activity results in a huge number of rules to be stored and managed. In a recent paper, we introduced strategies for reducing the size of a PPR database by using an ontology of the possible relations between data objects, the Datanode ontology, and applying the (A)AAAA methodology, a knowledge engineering approach that exploits Formal Concept Analysis (FCA). In this article, we investigate whether this reasoning is feasible and how it can be performed. For this purpose, we study the impact of compressing a rule base associated with an inference mechanism on the performance of the reasoning process. Moreover, we report on an extension of the (A)AAAA methodology that includes a coherency check algorithm, that makes this reasoning possible. We show how this compression, in addition to being beneficial to the management of the rule base, also has a positive impact on the performance and resource requirements of the reasoning process for policy propagation.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Simon Steyskal submitted on 16/Sep/2016
Minor Revision
Review Comment:

The paper (especially Section 5) has been significantly revised and together with the authors' response accompanying the resubmission most of my raised remarks/questions were addressed. With this, the quality of the paper has improved significantly, however, there are still a couple of open issues that need to be taken care of.

Again, you find detailed comments for each section listed below:

0) General
0.1) s/policies propagation/policy propagation/

1) Introduction
1.1) "Therefore, they need to know what are ... need support in order to .." -> "Hence, they need to be aware of any usage constraints attached to data sources they want to exploit, and they need support in publishing.."
1.2) "..relying on standards like the W3C PROV model to describe process executions.." -> The PROV vocabulary and data model are focused on expressing actions and resource states in a provenance chain (taken from [1]). However, to me "to describe process executions" sounds like one should use PROV for modeling, e.g., business processes in RDF rather than provenance information/exchange in data flows.
1.3) "ODRL, which can be exploited for policies formalisation and validation." -> You don't need to "exploit" ODRL for expressing policies.. ODRL was literally designed to serve that very purpose. rephrase! also -> "for policy formalisation"
1.4) "In [9], we studied how it is possible.." -> "In [9], we studied how PPR databases can be compressed by.."
1.5) "of reasoning with (compressed) PPR bases, which was missing [9]:" -> "was missing in [9]"; PPR base/PPRs database/kb of PPRs/rule base/set consisting of PPRs/..?
1.6) "extension of .. with coherency check" -> "with a cc"/"by adding an additional cc step to"/..
1.7) "Section 2 traverses the relevant literature." -> you traverse a graph, but review literature
1.8) "notion of Policy Propagation Rule (PPR)" -> "concept of Policy Propagation Rules (PPR)"
1.9) "We also evaluate the impact of this evolved methodology on the compression factor of the rule base." -> "this evolved meth." being the one proposed in [9] extended with a coherency check? to what extent are you "evaluating the impact"? In order to check whether using proposed "evolved meth." positively affects the CF of the rule base, you would have to compare respective CF against the one obtained by using the "standard meth." of [9].
1.10) "To this aim, we compare" -> "For this purpose, we compare"

2) RW
2.1) "systems are city Data Hubs" -> I would use either "City Data Hubs" or "city data hubs"
2.2) "we concentrate on the problem of reasoning with propagating policies" -> policies that are propagating? how do they relate to the rest of the paragraph, i.e., what's the transition from policy negotiation to policy propagation?
2.3) add ref. to POE [2,3]
2.4) "ODRL semantics have been" -> "has been"
2.5) "Datanode .. designed to express wide range of rel. between data art., and not only the ones derivable from actions" -> Agreed, just wanted to mention that the overarching concept is very similar. Eventually, you also end up with a set of relations and their respective dependencies among each other with the latter influencing attached policies.
2.6) "The RDF Licenses Database [26]" -> s/Database/dataset/
2.7) "reasoning on policies" -> "reasoning over policies"
2.8) "a ODRL" -> "an ODRL"
2.9) ", in conjunction with the Datanode ontology." -> how does this relate to the rest of the sentence?
2.10) "More recently, .. [12]." -> 8 years ago.. but yes, that's more recent than [25,30]
2.11) "problem of compression of propositional knowledge bases has been deeply studied" -> "compressing prop." "has been extensively studied in the past"
2.12) "process to boost rule execution" -> coll.; "boosting" in what sense? improving the performance? rephrase

3) Reasoning on pp
3.1) "we describe the approach for" -> "our approach for"
3.2) "Next, a discussion of each one of the above elements follows." -> remove or rephrase
3.3) s/dn:,/dn:/
3.4) "an overview of the top of the property hierarchy." -> rephrase, i.e. merge with its succeeding sentence.
3.5) s/fundamental dimensions/main dimensions/
3.6) maybe consider using \begin{description} environment?
3.7) "between something and its metadata" -> something? specify!
3.8) data node vs. datanode
3.9) s/spreasheet/spreadsheet/
3.10) s/have often/often have/
3.11) "We refer to [8] for a discussion on the genesis of Datanode." -> replace "genesis" with a less dramatic/biblical sounding word (i.e., development)
3.12) "and r a Datanode relation between the two" -> the two what? X&Y? data objects and policy? s/the two/X and Y/
3.13) "as the other parts can be derived from" -> what other parts? rephrase/clarify
3.14) Fig. 2 -> I would suggest using a different kind of arrow head (arrows with hollow arrow head are commonly used for representing inheritance relationships, see e.g. [4])
3.15) "We now introduce a guide use case." -> is "guide use case" actually a thing? rephrase (e.g., using "motivating example" instead of "guide uc")
3.16) "The following namespace prefixes will be used in the description: " -> description of what? rephrase and/or merge with previous sentence.
3.17) "and associate it with media objects" -> associates
3.18) "we need all the elements described above" -> too vague; what "elements" that were "described above" are you referring to? rephrase/clarify or simply remove that part, i.e. -> "In order to associate ... data, a description of .. database are needed."
3.19) Listing 1 ->
3.19.1) ex:FlickrTC -> just out of curiosity, what does TC stand for? terms and conditions?
3.19.2) odrl:Agreement is an rdfs:subClassOf odrl:Policy
3.19.3) If a policy is of type odrl:Agreement, it must contain information about all parties involved (see [5]). I suggest changing the policy to an odrl:Offer with ex:Flickr being the odrl:Assigner that proposes/offers respective terms of use.
3.19.4) "odrl:target ex:EventMedia" -> No, that's wrong! I'm fairly certain Flickr isn't specifying any usage policies for the asset ex:EventMedia in its terms of use. It does, however, specify ones for its API (hence the name). replace ex:EventMedia with ex:Flickr!
3.19.5) odrl:license is deprecated[6] and should be replaced by odrl:grantUse [7].
3.19.6) there are missing whitespaces between odrl:action odrl:license and cc:CommercialUse
3.19.7) multiple prohibitions defined for the same asset can be merged into one single prohibition (i.e., odrl:action odrl:sell, cc:CommercialUse, ... ];)
3.20) Listing 2 -> :Eventful, :LastFM, and :Upcoming are missing prefixes (ex: ?).
3.21) Table 1 -> FWIW, it seems as if Upcoming was restricting the use of its APIs to NC use only (see [8])
3.22) "The data flow .. can be exploited by a reasoner" -> exploited sounds too harsh, maybe use utilized/leveraged/.. instead?
3.23) "ODRL policies of the inputs and the PPRs" -> ODRL policies of PPRs? "the inputs, and the PPRs"
3.24) Listing 3 ->
3.24.1) "Listing 3: Example of policies " -> Actually, it's only one policy containing prohibitions/permissions.
3.24.2) How does this listing relate to listing 1? why was odrl:license replaced by odrl:modify?
3.24.3) s/cc:commercialUse/cc:CommercialUse/
3.24.4) you omitted the type triple for prohibitions, but kept it for the permission. why?
3.24.5) consider using a specific policy type (e.g., odrl:Set)
3.25) "We studied to what extent it is possible to reduce the number of rules without loss of information" -> how did you assess that and where can I find the results of said "study"?
3.26) fix/rephrase footnote 27; s/out work/our work/; s/og/of/

4.1) "to extract .. between data objects, and combined with" -> "and combined them"
4.2) "common behaviors of relations" -> "common behavior of relations"
4.3) "In our case, each concept maps a group of relations propagating a group of policies." -> maps to?
4.4) "clusters of policies that propagate with the same set of relations." -> policies propagate (active) vs. policies are propagated (passive)
4.5) "identify quasi matches that could be boosted to become a full match" -> "boosted" in the sense of ..? (cf. remark 2.12)
4.6) "performing changes in" -> "by performing changes in"
4.7) "performing changes in the rule base or the ontology" -> performing changes to the ontology too? I kind of understand the issue you outline on p.11 regarding Datanode's suitability for representing a common behavior of relations wrt. PP. But how would you refine the ontology? What would you change? remove/change property hierachies?
4.8) super concept vs. superconcept
4.9) "shows the result of the algorithm for a concept" -> "shows the results obtained by applying the algo. to Concept 71."
4.10) Table 2 -> hasCopy and isCopyOf have the same values for all measures because they are equivalent?
4.11) "so we only need to check whether 2 relations in the hasDerivation branch might also propagate the policies in concept 77" -> any 2 relations? clarify
4.12) "However, when this does not happen we can try to improve the approximation." -> what approximation? approximation == quasi matches? improve how? improve approx. by adapting the rule base? clarify
4.13) "With a Fill operation, " -> s/a Fill/the Fill/ ?

5) Evaluation
5.1) "but also that it might positively affect our ability to apply reasoning for policy propagation." -> ability to apply reasoning? to what extent would a "non-reduced" set of PPRs prevent you from reasoning over them? Taking succeeding sentence into account, I guess you actually meant positively affecting the performance of reasoning.
5.2) s/full set/uncompressed set/
5.3) P^i -> i = ?
5.4) "Uncompressed or a Compressed rule base" -> why suddenly \emph{} and capitalized?
5.5) "To take into account these two.." -> "In order to appropriately address both of those reason. strateg., we.."
5.6) "However, the information about the policies of the input was added. The table illustrates" -> "However, information about .. was added. Table 3 illustrates"
5.7) Figure 8/9 "boost" -> see remark 2.12
5.8) "and how it affects reasoning on propagating policies" -> see remark 4.4

== References ==

Review #2
By David Corsar submitted on 16/Sep/2016
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The authors have largely addressed my comments in this resubmission; the material relating to the (A)AAAA methodology has been reduced and replaced with an improved discussion of the datanode ontology and examples. The text in the introduction and related work still features a high degree of overlap with previous work. Overall the quality of writing has been improved from the previous version.

The revised paper makes it clearer where the contributions of this work really are: the addition of the coherency check mechanism to the (A)AAAA methodology, and a performance evaluation of reasoners for policy propagation. The description of the coherency check is reasonably clear; as with the previous version, the evaluation focuses on the performance of reasoners in terms of time taken, and memory and processor use. Given this, I have reservations as to if these provide sufficient contributions for publication in this journal, and defer this decision to the editors.

Given that the novel extension to research here is the coherency check, I feel the extent of the evaluation is key when considering the paper’s contributions. While the evaluation demonstrates the feasibility and run-time benefits of the approach based on two alternative implementations, it is limited to just that; the is no evaluation of the (A)AAAA methodology as a knowledge engineering approach, or discussion regarding who are the intended users, how many times was the (A)AAAA loop applied in the example use case, why, and how long this took, the degree of effort users will have to invest when applying the methodology and how this compares to related approaches, what is the effect of errors (which the authors state on page 11 may be introduced) and are there any ways to minimise them, have users (that are not the authors) used it and how did they fair? While the provided performance metrics are useful, the evaluation could also be strengthened by consideration of other factors – for example, how correct were the results of the reasoners. Further, there is no evidence provided that illustrates the benefit of the coherency check mechanism – previously (and as stated on page 2) there was a table illustrating the impact of the coherency check on the rule base compression factor, and so potential benefits to management of the rule base, however this has now been removed, so it is unclear how useful or not it is.

Other minor typos, etc.
Pg 5, definition of derivation is missing commas after dhn:isSelection and dn:remodelledFrom
Pg 5, definition of interpretation: spreasheet -> spreadsheet
Pg 7, para 3; it is unclear who extracted the policies from the Flickr API
Pg 8, listing 1: odrl:actionodrl:license -> odrl:action odrl:license
Pg 7/8, listing 1: the prohibition on EventMedia to license – could it be clarified what this means?
Pg 8, Listing 3: it is unclear where the prohibition on modify output comes from, while the rest clearly come from listing 1, but the prohibition to licence from listing 1 does not feature in listing 3
Pg 8, footnote 27: out work -> our work; also og -> eg
Pg 13, listing 7: features “Optimised” in title, should this be removed?
Pg 15, para 2: Figures 6d and 6e -> Figures 7d and 7e
Pg 15, para 2: Discussion of SPIN reasoner states Figure 7 shows an increase in space consumption, but Figure 7e shows a decrease

Review #3
By Ernesto Damiani submitted on 30/Sep/2016
Review Comment:

The authors extensively revised their previous submission, clarifying where their original contribution lies with respect to their previous work. Also, it is now more clear what they achieved, i.e. compressing policy propagation rules (seen as an eternal artifact w.r.t. the rule base.
I agree that the methodology is not affected when policies of assets are modified, assuming that the corresponding atomic policy is already present in the rule base.
The authors lso made explicit which changes to the rule base (the introduction of an entirely new policy) will require setting up new propagation rules. The paper is interesting and well-written.