Review Comment:
Thank you for your interesting paper which describes a method for ontological representation of GDPR regulations, and architectures for applying these to blockchain data processing regulation. The work is largely original, and the paper is for the most part well-written and clear. I believe there are multiple strands of interesting work here, and I'd like to see each of these developed. I find it very difficult to assess significance, however, as the evaluation methodologies are not well-justified and seem to rest on some major assumptions which are not backed up or, in some cases, even stated. In particular, there seems to be an unspoken assumption that ontology development processes (not just ontologies themselves) need to be tailored for blockchain use, and that to do so requires automation (in some unspecified way) - it is not at all clear to me why either of these would hold. It is also unclear how the proposed approach would deal with misbehaving stakeholders - it seems as if a malicious data controller wouldn't be regulated by it, and even a cooperative controller might nonetheless leak personal data unintentionally via metadata. I'd really like to see these assumptions made explicit and questioned or justified more, and evaluation methodologies (e.g., the proposed automation index) justified in comparison with approaches from the literature.
I can't find a long-term URL for resources and so also cannot assess future replicability potential.
More specific comments:
Section 4.2.2: the assumption that personal data is not leaked because controller-policy authority communication is restricted to metadata is a big assumption, and not well justified. It's well-known that it's often possible to infer personal data from metadata. The usual example is in the domain of Web browsing: metadata might include the IP addresses of sites that I visit, without the content, but reverse DNS queries can reveal the nature of those sites and potentially reveal data relating to protected characteristics, e.g., private health data, if I visit sites relating to particular conditions, gender, sexuality, religion, etc. *If* the content of metadata is sufficiently limited, it may indeed not leak any personal data, but that needs to be demonstrated for each case rather than assumed.
Section 4.2.5: the argument about comparative business suitability (p11, lines 37-40) seems to contradict its own conclusion. Business requirements can vary depending on the specific business; as you say, if the requirements are throughput and maintenance, perhaps TTP is better. If the requirement is for data security, the consensus architecture may be better. It seems as if you're arguing (correctly) for context sensitivity when it comes to meeting business requirements, but you then conclude something context-insensitive (that TTP=high business suitability, consensus=low)
Definition 1 (p14): A conceptual block is just a set of terms; it's not clear to me how a tuple of sets of terms constitutes a "path". And without constraints between elements of these tuples, is it possible to have legally incoherent tuples? I assume yes. Is there a reason why that wouldn't matter? It feels like definitions 1 and 2 would be easier to follow and assess with specific motivated examples integrated into the explanations rather than in an appendix
Compliance (p17): defining compliance as being the rule set including the controller metadata is hard to relate to actual legal compliance. How is the controller metadata generated? Is there anything to prevent a malicious data controller from lying about the metadata, e.g., purpose? Or requesting data storage legitimately, but performing illegitimate processing offline on the stored data?
Section 6.3: What does it mean for an ontology development methodology to be "suitable for blockchains"? As far as I can tell, ontology development is entirely orthogonal to any use of blockchains (your developed ontology, for example, could potentially also be applied in scenarios with no blockchain involved at all). Your evaluation implies that you think "more automated" means "more suitable for blockchains". Why? Your metric for "automation index" is, as far as I can tell, novel - you don't refer to any literature on measuring automation, and the metric itself is explained but not justified. What makes this a good metric? Difficulty and complexity of algorithmic steps are both well-studied phenomena with a range of objective metrics for different situations, but even if subjective - perhaps metrics incorporating those factors might be more meaningful than metrics without?
To summarise, why does ontology development need to be different for blockchain use? If it does, why is automation relevant? And if it is relevant, why is your proposed metric a good one?
p25, lines 41-43 are circular: "To answer whether X happens while maintaining Y, the proposed methodology does X while maintaining Y". It's not until the end of that whole paragraph that you say that you have evaluated this empirically with a set of known evaluation results. Your conclusion from this experiment (p27, lines 37-38) is stronger than the results justify. You can conclude that the ontology version handles the evaluation cases correctly, but that does not imply it is legally identical - perhaps there are other cases where human courts will make a decision which the ontology version would not.
On comparative levels of privacy protection, you compare "doing something to protect privacy" to "doing nothing to protect privacy" - of course there is an improvement. It would be much more meaningful to compare your approach with other approaches for improving privacy protection on blockchains.
Section 6.4: "How work in the real blockchain"->"How it works on a real blockchain" I'm not sure, however, that this section is necessary - it seems just to repeat things that you said earlier to compare TTP to consensus architectures.
Proofreading comments:
p2, line 37: "propose"->"proposes"
p3, lines 14-15: "of 10 min now" I can't parse this. Average time of 10 minutes would make sense, but then I don't know how the word "now" connects to that. "10 minutes from now"? What moment is "now" referring to here (if it is)?
p3, line 23: "blockc- hains"->"blockchains"
p3, line 25: "supplements"->"supplement". Although I'm also not sure what "supplement the problems" means - are you saying that permissioned blockchains *address* the problems of public blockchains, or that they make those problems worse?
p3, line 42: I believe that a data *subject* in the GDPR can only be a natural person; I don't think a corporation can be a data subject.
p3, line 46: "assesses"->"assess"
p4, line 20: "partaticipants"->"participants"
p4, line 29: an RDF URI may be a web address, or not. It's common for RDF URIs to also be dereferencable via HTTP(S), though not required.
p4, line 31: "readibility"->"readability"
p5, lines 1-10: I think specific examples of at least one of the "sub" relations might make this clearer for a reader not already familiar with RDFS.
p5, line 12: it's a bit of a subtle distinction, but strictly speaking OWL isn't quite a direct *extension* of RDF(S) (although RDF(S) can be embedded in OWL). Rather than "adds to", it might be more accurate to say that OWL has significantly greater expressive power than RDF(S).
p5, line 50: "studied on"->"studied"
p6, line 5: "process"->"processes"
p6, line 33: "model"->"models"
p6, line 38: "every"->"all" (this avoids the issue that English is a bit confused about whether "data" is plural or singular)
p8, line 45: what does "honest but curious" mean here?
p9, line 36: "the The"->"the"
p15, line 23: rather than saying "vast", give the number of rules/triples/OWL statements (whichever makes the most sense to give)
p24, line 44: "AcceeData"->"AccessData"
p26, lines 21-25: the quoted text includes "legitimate basis". The rest of the paragraph uses the phrase "legal basis". These aren't the same.
p30, line 10: "discussion"->"Discussion"
p30, line 12: "maden"->"made"
p30, line 39: "conclusion and future work"->"Conclusion and future work"
p30, line 41: "This study have"->"This study has"
|