Self-regulatory Framework for Blockchain Compliance with the General Data Protection Regulation

Tracking #: 3466-4680

Sejin Han
Sooyong Park

Responsible editor: 
Sabrina Kirrane

Submission type: 
Full Paper
Since the General Data Protection Regulation (GDPR) enforcement, blockchain stakeholders worldwide have been facing regulatory compliance issues. Unfortunately, few studies have comprehensively addressed this compliance issue. Additionally, inherent blockchain immutability and transparency present challenges from a GDPR perspective. Thus, this study proposed a self-regulatory framework for blockchain compliance with the GDPR in this study. The proposed framework is a regulatory governance model that makes blockchains recognize the GDPR regulatory principles and regulates data processing activities based on these principles. Compared to previous models, the proposed framework makes considerable improvements regarding regulatory autonomy and preservation: (1) informal legal knowledge is automatically transformed into a formalized ontology model and integrated into a blockchain system in six phases with minimum intervention of centralized elements, and (2) the proposed framework has high regulatory preservation, which preserves the original legal intent of the GDPR, even during phase transition so that legal principles can be accommodated into blockchains without any loss of meaning. Moreover, the proposed framework was implemented as a pilot in the Hyperledger Fabric test network; the feasibility of program implementation was demonstrated using scenario-based tests. This study is thus very valuable in demonstrating an early self-regulatory framework for blockchains that are in the blind spot of the regulations.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Víctor Rodríguez-Doncel submitted on 10/Jul/2023
Major Revision
Review Comment:

Overall appraisal

MY SUMMARY: This paper describes two procedures to make data transfers compliant to GDPR using blockchains. In the first procedure, a tusted-third party (TTP) stores encrypted data usecontracts (DUC) in a blockchain and evaluates data transfer requests against policies signed by the different parties (data subject, controller and TTP). Policy evaluation made by the TTP is also stored in the blockchain. In the second procedure, a consensus mechanism (practical Byzantine fault-tolerant) is chosen and highly reputed nodes are selected as endorsing parties. Both alternatives are compared and tested using the Hyperledger Fabric test network. An ontology models some of the GDPR concepts and norms, and the algorithm for evaluating DUCs is also described. A quantitive comparison with other solutions in the state of the art is also offered; performance metrics are used (latency and throughput). A series of Appendixes is offered with some details: on the ontology axioms, on the GDPR ontology encoding, on a comparison of the features of others papers (Ranto, Davari and Mahindrakar) and abbreviations.

GENERAL APPRAISAL: The authors have made an extensive work encoding the GDPR principles in OWL, describe an implemented system and follow a very practical approach. The system is thoroughly described and the main goals are achieved. Their view, as non-EU researchers partially working from the industry, is also much appreciated.

I observe, however two major problems in this work: one in the form, and in the substance.
The formal error is that authors use quite liberally some terms that for the Semantic Web community have very precise meanings; and that their explanations must be adequate to the well-educated SWJ readers. The substantial problems have to do with the lack of novelty or advance over the state the art --or perhaps there is and perhaps it only needs to be better evidenced.

Formal comments:
* the term "ontology" is quite broad and its use in the paper cannot be categorically rejected. However, its use throughout the document to refer to both a terminological box (T-Box) and the data generated at runtime (A-Box) leads to constant confusion. The distinction between a data model and the data is fundamental and its use along the paper is recurrently twisted.
* Sentences such as "The informal legal knowledge is automatically transformed into a formalized ontology model" give the false impression there will be an NLP task behind --there is nothing of the sort.
* Section 2.3 can be dropped for the SWJ readers usually know this. Particularly Fig1 and Fig2.
* Page 23, line 15 -- Hermit logs are not relevant and can be deleted

Substantial comments:
* there is no obvious progress over the state of the art, for the claim advantages are not supported by evidence.
* The use of METHONTOLOGY is correct, although providing some "compentency questions" would have been more modern/accurate.
* the state of the art is correct, but some relevant works are omitted. Other papers could have been considered:[3-5]. Citing this papers will NOT improve the paper quality, but perhaps the authors want to read about the approaches in [1][2] too. Finally, work carried out in the W3C DPV CG could be reused. Some of the elements in Fig 10 could be mapped to the DPV terms.
* Evaluation: I think authors could make explicit all the possible attacks the system may suffer (and why they can be neutralised). The algorithm needs to be evaluated not only in terms of performance times, but also in terms of possible attacks.
* Important drawback: there is no online material supporting this work --I would have expected a github repo to check everything. This is a fundamental problem!

English comments:

I don't think bad English should deter the publication of a good research paper, but improving readability doesn't hurt either. I therefore recommend authors to correct some issues.
* Page 32, "persinal" -> "personal"
* Headings are not capitalised. Examples are in page 20 line 32, page 19, line 13. or page 30, line 38.
* Page 28, line 47. "How work in the real blockchain" --> improve it

Other points are not errors but can be more accurate:
* Page 1, line 27: "blockchain stakeholders worldwide have been facing regulatory compliance issues" --> well, only some of them.
* The concept of "regulatory preservation" is intriguing me. What is the definition? I would recommend authors to refer to the idea of "legal isomorphism" as presented by Trevor Bench Capon, if this is their point.
* The concept of "regulatory path" can be better defined: is it just a data structure with purpose, context, etc.?

[1] Zichichi, M., Ferretti, S., Gabriele D., Rodríguez-Doncel, V. (2022) Data governance through a multi-DLT architecture in view of the GDPR. Cluster Computing.
[2] Zichichi, M., Ferretti, S., D'Angelo, G., Rodríguez-Doncel, V. (2021) Personal Data Access Control Through Distributed Authorization In IEEE 19th International Symposium on Network Computing and Applications (NCA),ISSN 2643-7929, pp. 1-4. IEEE,
[3] Palmirani, M., Martoni, M., Rossi, A., Cesare, B., & Livio, R. (2018). Legal ontology for modelling GDPR concepts and norms. Frontiers in Artificial Intelligence and Applications, 313, 91-100.
[4] Robaldo, L., Bartolini, C., & Lenzini, G. (2020, May). The DAPRECO knowledge base: representing the GDPR in LegalRuleML. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 5688-5697).
[5] Bartolini, C., Lenzini, G., & Santos, C. (2018). A legal validation of a formal representation of GDPR articles. In Proceedings of the 2nd JURIX Workshop on Technologies for Regulatory Compliance (Terecom).

Originality: not really original, but the topic is very relevant and admits multiple approaches.
Significance: the impact of the implementation has not been disclosed. will this work have a high impact? we cannot ascertain that. perhaps authors want to improve this point explaining the outreach of the solution.
Quality of writing: not very good, but this is the least important point.

Review #2
By Aisling Third submitted on 11/Aug/2023
Major Revision
Review Comment:

Thank you for your interesting paper which describes a method for ontological representation of GDPR regulations, and architectures for applying these to blockchain data processing regulation. The work is largely original, and the paper is for the most part well-written and clear. I believe there are multiple strands of interesting work here, and I'd like to see each of these developed. I find it very difficult to assess significance, however, as the evaluation methodologies are not well-justified and seem to rest on some major assumptions which are not backed up or, in some cases, even stated. In particular, there seems to be an unspoken assumption that ontology development processes (not just ontologies themselves) need to be tailored for blockchain use, and that to do so requires automation (in some unspecified way) - it is not at all clear to me why either of these would hold. It is also unclear how the proposed approach would deal with misbehaving stakeholders - it seems as if a malicious data controller wouldn't be regulated by it, and even a cooperative controller might nonetheless leak personal data unintentionally via metadata. I'd really like to see these assumptions made explicit and questioned or justified more, and evaluation methodologies (e.g., the proposed automation index) justified in comparison with approaches from the literature.

I can't find a long-term URL for resources and so also cannot assess future replicability potential.

More specific comments:

Section 4.2.2: the assumption that personal data is not leaked because controller-policy authority communication is restricted to metadata is a big assumption, and not well justified. It's well-known that it's often possible to infer personal data from metadata. The usual example is in the domain of Web browsing: metadata might include the IP addresses of sites that I visit, without the content, but reverse DNS queries can reveal the nature of those sites and potentially reveal data relating to protected characteristics, e.g., private health data, if I visit sites relating to particular conditions, gender, sexuality, religion, etc. *If* the content of metadata is sufficiently limited, it may indeed not leak any personal data, but that needs to be demonstrated for each case rather than assumed.

Section 4.2.5: the argument about comparative business suitability (p11, lines 37-40) seems to contradict its own conclusion. Business requirements can vary depending on the specific business; as you say, if the requirements are throughput and maintenance, perhaps TTP is better. If the requirement is for data security, the consensus architecture may be better. It seems as if you're arguing (correctly) for context sensitivity when it comes to meeting business requirements, but you then conclude something context-insensitive (that TTP=high business suitability, consensus=low)

Definition 1 (p14): A conceptual block is just a set of terms; it's not clear to me how a tuple of sets of terms constitutes a "path". And without constraints between elements of these tuples, is it possible to have legally incoherent tuples? I assume yes. Is there a reason why that wouldn't matter? It feels like definitions 1 and 2 would be easier to follow and assess with specific motivated examples integrated into the explanations rather than in an appendix

Compliance (p17): defining compliance as being the rule set including the controller metadata is hard to relate to actual legal compliance. How is the controller metadata generated? Is there anything to prevent a malicious data controller from lying about the metadata, e.g., purpose? Or requesting data storage legitimately, but performing illegitimate processing offline on the stored data?

Section 6.3: What does it mean for an ontology development methodology to be "suitable for blockchains"? As far as I can tell, ontology development is entirely orthogonal to any use of blockchains (your developed ontology, for example, could potentially also be applied in scenarios with no blockchain involved at all). Your evaluation implies that you think "more automated" means "more suitable for blockchains". Why? Your metric for "automation index" is, as far as I can tell, novel - you don't refer to any literature on measuring automation, and the metric itself is explained but not justified. What makes this a good metric? Difficulty and complexity of algorithmic steps are both well-studied phenomena with a range of objective metrics for different situations, but even if subjective - perhaps metrics incorporating those factors might be more meaningful than metrics without?

To summarise, why does ontology development need to be different for blockchain use? If it does, why is automation relevant? And if it is relevant, why is your proposed metric a good one?

p25, lines 41-43 are circular: "To answer whether X happens while maintaining Y, the proposed methodology does X while maintaining Y". It's not until the end of that whole paragraph that you say that you have evaluated this empirically with a set of known evaluation results. Your conclusion from this experiment (p27, lines 37-38) is stronger than the results justify. You can conclude that the ontology version handles the evaluation cases correctly, but that does not imply it is legally identical - perhaps there are other cases where human courts will make a decision which the ontology version would not.

On comparative levels of privacy protection, you compare "doing something to protect privacy" to "doing nothing to protect privacy" - of course there is an improvement. It would be much more meaningful to compare your approach with other approaches for improving privacy protection on blockchains.

Section 6.4: "How work in the real blockchain"->"How it works on a real blockchain" I'm not sure, however, that this section is necessary - it seems just to repeat things that you said earlier to compare TTP to consensus architectures.

Proofreading comments:

p2, line 37: "propose"->"proposes"

p3, lines 14-15: "of 10 min now" I can't parse this. Average time of 10 minutes would make sense, but then I don't know how the word "now" connects to that. "10 minutes from now"? What moment is "now" referring to here (if it is)?

p3, line 23: "blockc- hains"->"blockchains"

p3, line 25: "supplements"->"supplement". Although I'm also not sure what "supplement the problems" means - are you saying that permissioned blockchains *address* the problems of public blockchains, or that they make those problems worse?

p3, line 42: I believe that a data *subject* in the GDPR can only be a natural person; I don't think a corporation can be a data subject.

p3, line 46: "assesses"->"assess"

p4, line 20: "partaticipants"->"participants"

p4, line 29: an RDF URI may be a web address, or not. It's common for RDF URIs to also be dereferencable via HTTP(S), though not required.

p4, line 31: "readibility"->"readability"

p5, lines 1-10: I think specific examples of at least one of the "sub" relations might make this clearer for a reader not already familiar with RDFS.

p5, line 12: it's a bit of a subtle distinction, but strictly speaking OWL isn't quite a direct *extension* of RDF(S) (although RDF(S) can be embedded in OWL). Rather than "adds to", it might be more accurate to say that OWL has significantly greater expressive power than RDF(S).

p5, line 50: "studied on"->"studied"

p6, line 5: "process"->"processes"

p6, line 33: "model"->"models"

p6, line 38: "every"->"all" (this avoids the issue that English is a bit confused about whether "data" is plural or singular)

p8, line 45: what does "honest but curious" mean here?

p9, line 36: "the The"->"the"

p15, line 23: rather than saying "vast", give the number of rules/triples/OWL statements (whichever makes the most sense to give)

p24, line 44: "AcceeData"->"AccessData"

p26, lines 21-25: the quoted text includes "legitimate basis". The rest of the paragraph uses the phrase "legal basis". These aren't the same.

p30, line 10: "discussion"->"Discussion"

p30, line 12: "maden"->"made"

p30, line 39: "conclusion and future work"->"Conclusion and future work"

p30, line 41: "This study have"->"This study has"

Review #3
By Luis-Daniel Ibáñez submitted on 16/Aug/2023
Review Comment:

The problem to solve is not precisely defined, from the introduction it seems it is "make blockchains recognize GDPR regulatory principles" and "modelling legal principles in a way that blockchains understand". None of those are clear definitions. If anything, would point only at the Ontology as a contribution.

Authors define the approach as "self-regulatory", there is no definition of what "self-regulatory" means.

Authors repeat a lot in the intro their framework has higher "regulatory preservation", but the metrics presented in section 6.3 have no foundation. It is not clear how a unit is "automated" or not, and who judges that. There is no interpretation of the scores reported on Table 5.
The method for measuring regulatory intent is even more confusing,I simply can't assess if the two representations are "legally identical", and I think only lawyers can say that.

A DUC (Data Usage Contract) is defined as an Ontology (p15 "hereinafter, this ontology will be referred to as DUC "), it is unclear the connection between a DUC and a regulatory graph, which is the one thing that is more precisely defined. This is also very confusing with respect to an earlier definition of a DUC as something that "defines the data processing restrictions of the service". I believe there is a confusion between an Ontology to define DUCs, and instances of a DUC. That is a very deep issue.
Furthermore, the DUC is said to be "stored in the Blockchain" (sec 4.3.6). how does the system deals with blockchain space issues if also DUC is a "Vast knowledge base"? It is also unclear how something that is stored in the blockchain can be queried with SPARQL.

The system architecture refers to a Chaincode (Sec. 4.2.1) and endorsements, which are Hyperledger specific constructs. This means this approach is only applicable to Hyperledger, and which

The system concept does not include any Blockchain specifics, it can be applied to any Database. The compliance scenario The approach does use Blockchain as a tool for audit and to store DUCs, but then I don't see how this make a Blockchain compliance specific, and I'm missing positioning against approaches that use Blockchain for compliance on top of any type of storage, or as a notary.

The role of "reputation" seems core to an important part of the approach. It is no clear about the legal aspect of what happens in the corner case that a reputed evaluator decides to . The paper also does not define how the reputation score is calculated. For example, if one of the conditions is "must be no record of non-compliance" then the reputation score must always be 100%.

The architecture mentions that a controller submits a "Transaction" to a Blockchain. It is not clear if this means a transaction in the sense of cryptocurrency or in the sense of a adding a piece of data to the Blockchain. In any case, a transaction is a single unit of processing, and consent or restrictions would have been to be "allowed to store data in a Blockchain" would have been agreed beforehand, and there is no need for any policy evaluation at the moment of submitting a transaction. Yet again, this approach seems to be using Blockchain to manage the regulatory evaluation of a "processing transaction", which by no means make a blockchain compliant with GDPR as advertised.

I can see some merit on the ontology development and potential use, but I really don't see how the proposed framework is self-regulatory or helps blockchains to be GDPR compliant.