Consent Through the Lens of Semantics: State of the Art Survey and Best Practices

Tracking #: 2632-3846

Authors: 
Anelia Kurteva
Tek Raj Chhetri
Harshvardhan J. Pandit
Anna Fensel

Responsible editor: 
Guest Editors ST 4 Data and Algorithmic Governance 2020

Submission type: 
Survey Article
Abstract: 
Our paper presents a literature survey of existing solutions that use semantic technology for implementing consent. The main focus is on ontologies, how they are used for consent representation and for consent management in combination with other technologies such as blockchain. We also focus on visualisation solutions aimed at improving individuals’ consent comprehension. Finally, based on the overviewed state of the art we propose best practises for consent implementation.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 29/Dec/2020
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: 
(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. 

This survey focuses on ontologies to treat informed consent applicable to data processing, according to GDPR. It also analyses solutions for consent visualization and consent management.

This article's subject is very important, and even if it has been treated in different domains (healthcare, IoT), a survey oriented to semantic web technologies was missing. 

This work could interest a large audience, from PhD students, researchers, and practitioners.

(2) How comprehensive and how balanced is the presentation and coverage. 

For this survey, the authors follow a comprehensive methodology. 
The article is organized into six sections. 
• Section 1 and 6 are the Introduction and Conclusions. 
• Section 2, explains the chosen methodology. In particular, this section introduces a « model of consent life-cycle » consisting of four states associated with consent, namely, Request, Comprehension, Decision, and Use.
• Section 3, is the most substantial one where all analyzed works are overviewed. It contains three subsections devoted to different types of works: (1) Semantic Models for Consent, (2) Consent Visualisation, and (3) Consent Management. Each subsection ends with a general summary supported by a comparison table.
• Section 4, is about Standardization Initiatives and Efforts. Even if this section is short compared to Sections 3 and 5, its content is welcome to mention.
• Section 5, proposes Best Practices and Recommendations. This section is based on the surveyed literature. It consists of 4 subsections; each one focuses on a state of the model of consent life-cycle introduced in Section 2. Each subsection is organized in 3 parts based on analyzed works as introduced in Section 3.

Sections 3 and 5 are the most important in terms of content. Maybe Section 4 should be moved after Section 5 to avoid a small section in the middle of the paper's two most important sections.

(3) Readability and clarity of the presentation. 

Even if I tend to agree with this survey's organization, I think its clarity and presentation should be improved.

In my opinion, an illustrative example is missing about how consent is managed currently and how it could be improved based on semantic web technologies. The management of the pandemic through IT tools to limit the contamination across the world can give authors an opportunity to analyze recent implementations about consent management. Even if semantic web technologies are not used, or consent is not well managed, an example based on such tools could render a less abstract, clearer, and more impactful survey. 
In general, the added value of using semantic web technologies to manage informed consent should be clearly explained. 

The hardness of a survey paper is to uniformize used vocabulary and to reach the same level of explanation to overview each existing work. Section 3, which overviews existing literature, can be improved in that sense if a common thread is used. One idea can be to use Table 1 (that actually is not well exploited over the paper) to make clearer the works' comparison. Strengths and limitations should be clear and comparable. Table 1 is very interesting and very helpful to identify the relevant GDPR clauses depending on the appropriate concept/question. Such concepts/questions could help to position existing solutions. This table could be used all along with the paper. It is currently hard for the reader to construct a personal opinion about analyzed works because their overviews are very general and frequently incomparable. Summary tables give a very broad idea about works' scopes, but I would expect a more pertinent and detailed comparison in a survey.

This survey's main focus is on ontologies about consent, but unfortunately, classes are only mentioned for one model in Section 3.1.5. Similar to consent visualization, semantic models for consent could benefit from some illustrative figures. Each model should be presented with concrete concepts, classes, properties, etc. 

Section 5, about Best Practices and Recommendations, is hard to understand and to concretely assimilate from a practical perspective. This section is very confusing. Recommendations here should result from an analysis of overviewed works in Sections 3 and 4, and the reader should be in the position of making her/his own analysis. Often, recommendations inside each subsection are general enough to be applied to all parts of this section. Indeed, several recommendations are so general that it should not be mentioned (see page 17, 2nd column, line 43). Maybe the organization of this section can be inverted, i.e., to give recommendations about the three themes of Section 3, and inside them to give particular recommendations, if necessary, for each of the four states of the model of consent life-cycle.

Concerning blockchains, you mention that there are important limitations due to immutability, in particular, to implement withdraw of consent. Can you say something about what could be the consequences for blockchains' guarantees if immutability is bypassed?

According to the overview in Section 3.3, Table 4, lines devoted to ADvoCATE and Davari et al., should mention blockchains.

Paragraphs of page 16, 1st column, lines 1 to 23, should be introduced early, in Section 1 or 2.

In page 18, line 5, you say that some type of storage could violate GDPR, and you mention relational and graph databases (in addition to blockchains). I do not see how these two storage types can violate GDPR. Can you argue, please?

Finally, it could be interesting to include a discussion about the hardness of implementing informed consent depending on the application context. The difficulty is not the same for commercial apps as it is for traceability of the covid19 pandemic or blockchain-based systems such as cryptocurrencies. 

(4) Importance of the covered material to the broader Semantic Web community.

This survey can be important for the Semantic Web community interested in privacy in general and in GDPR compliance in particular.

This paper is easy to read and well-written; however, there are several typos, for instance :
• page 2, 2nd column, line 45: The final point is missing.
• page 5, 1st column, line 14: « This … « this paragraph is repeated at line 25.
• page 5, 2nd column, line 1: the verb should be are instead of is.
• page 8, 2nd column, line 40: missing blank space.
• page 9, 2nd column, line 28: missing blank space.
• page 9, 2nd column, line 33: missing closing parenthesis.
• page 12, 1st column, line 35: missing blank space.
• page 12, 1st column, line 33: verify « blockcahin ».
• page 13, 2nd column, line 22: missing blank space.
• page 16, 1st column, line 21: suppress one « be ».
• page 18, 1st column, line 33: It should be Section 5.1 instead of 5.3?
• page 18, 2nd column, line 34-35: missing blank spaces.
• page 18, 2nd column, line 45: suppress one « have ».
 

Review #2
Anonymous submitted on 18/Jan/2021
Suggestion:
Minor Revision
Review Comment:

The article provides an in-depth dive into several models of GDPR using ‘semantic’ technologies. The reviews, which are focussed on how each of these models handles the issue of consent to a particular privacy policy. They also review the user interfaces and visualization tools provided by each of these. The authors also briefly review the different efforts at standardization.

In the second half of the paper, the authors identifies best practices and make a sett of recommendations.

Overall, the article will serve well as an introductory text and is comprehensive in covering the major approaches. The article is well written and easy to understand.

It would have been nice to have a deeper discussion of the following:
1. A comparative analysis of the different approaches and the tradeoffs made. Some are clearly more concise and others more expressive. It would be good to understand specific uses cases which can be covered by one and not the other.
2. Though GDPR is the most widely known, California’s recent CCPA is also having a huge impact on the market. It would be good to have at least a short discussion on how these models may or may not be able to cover CCPA.

Finally, given that few, if any of the big web companies use semantic technologies for implementing GDPR, it would be good to have a discussion on how these approaches might help the companies and more importantly, users in handling the requirements imposed by GDPR.

Review #3
By Allan Third submitted on 23/Jan/2021
Suggestion:
Minor Revision
Review Comment:

This paper is a survey of semantic technological approaches to the handling of consent - specifically, consent for data processing in the context of the GDPR - covering representation, management, and visualisation of consent processes.

Thank you for the survey, it's a very clear introduction to the topic, which is important and timely, and likely to be useful and interesting to a wide audience, in the Semantic Web community and hopefully beyond. It appears to be comprehensive and balanced, with a few caveats that I think could be improved in terms of clarity and presentation and consistency.

The point that jumped out at me most is the presence of the sections on incentivisation. These are in stark contrast to the rest of the paper and go rather against the primary topic. It's very jarring to read, on the one hand, a sensible design principle about avoiding 'dark patterns' intended to draw or manipulate users into making choices the designers want, and then on the other hand to read a section on the use of incentivisation "to change one’s mind regarding an action, and to make one perform an action we want" presented uncritically. The latter is clearly precisely the kind of dark pattern the authors have recommended against. I could understand a discussion of the use of incentivisation to encourage users to engage fully with the informed consent process - in a sense, to delay the giving of consent until they have fully explored the implications - but that doesn't seem to be the topic. I'm not sure, as is, how this topic contributes to the discussion of consent and I suggest removing it.

I also take some issue with the claim made (relatively late in the paper: p21, second column) that there's no need to visualise data processing for end users. What's done in data processing is an important part of a user understanding the consequences of data sharing; dismissing users as a potential audience for visualisations of processing could get in the way of this. It's also quite a stretch to imply that data processors and controllers have legal experience that an end-user doesn't - pretty much anyone can legally be a data processor or controller, there's no expectation of any particular legal experience. Again, a little odd to see in a review that otherwise takes the interests of data subjects very seriously.

The competency questions in Table 1 play an unclear role in the paper. How were they derived? It looks as if they're based on requirements in the GDPR itself but if they're important for the paper, it would be good to know for sure, and have an idea of coverage/comprehensiveness with regard to their source. It's not, however, clear how important they are to the paper. Perhaps I misunderstood, but I don't know what role they play in the survey or how they have been used. It would be good to clarify their purpose and use explicitly. To some extent, the same goes for the simplified model of consent presented in Figure 1. I like the idea of having it as a framing structure for the survey, but it didn't clearly come across as being used in the rest of the paper - perhaps this could be made clearer?

I'd welcome a little more depth in the discussion in some places - for example, quite a lot of 3.3 reads as very uncertain about what these systems do and how with regard to consent itself - we could live without knowing that D3.js is used for visualisation and MongoDB for data storage if the space were instead used to say what's unique or interesting about each when it comes to the actual tasks of consent management and user/institution experiences of them, or some insight into how consent can be imposed using cryptography technology.

I enjoyed reading the paper, and I would find it useful as a source - I don't necessarily think it would need significant revisions to address these points. Thank you for writing it.

Some more specific comments below:

p2 c2 l9-10: "technology"->"technologies" (referred to as "them" later in the sentence)

p2 c2 l24: Just "Research" not "Another research" ("research" is a mass noun)

p2 c2 l42-44: "the" before "development and ontology engineering processes" and "relationship of"

p3 c1 l4: "investigations into whether"

p3 c2 l33: Simplified model of the consent lifecycle - what is it simplified from? (And might this model itself incorporate presuppositions about the semantics of consent? If it does, is it a problem?)

p3 Figure 1. Withdraw, Expire, Invalidate, and Refuse should surely either go to Request, or should exit the cycle?

p5 c1 l25-36: duplication of previous paragraph

p6 c2 l39: Presumably this is "for achieving compliance with the GDPR"?

p7 c1 l35: "ConsentAssertion" (not "ConsentAssertation") is the SPL term

p7 c2 l2: "ApplicationFunctioning" (ColPri)

p7 c2: Given their relationship, do SPL and DPV share any structure/contents? DPV is also not mentioned in the summary in 3.1.9.

p9 c1 l46: "user's"->"users"

p10 c1 l17: I'd either separate CoRe and CURE into different entries, if their differences are as big as it sounds, or, if keeping them in one entry, make it explicit in the first paragraph of 3.2.3 that you're looking at two things, one an evolution of the other. Otherwise, the jump to talking about CURE was a bit confusing.

p10 c1 l26: Missing text for footnote 23 (AngularJS), and missing verb phrase after "PostgreSQL".

p11 c1: I'm not sure I really understand what EnCoRe is and does. For example, this reads as if EnCoRe keeps a central record of user data on consent preferences - that seems unlikely but problematic if it does.

p11 c2: It's a bit surprising to read about a consent management platform which uses blockchain without seeing any discussion of how it manages the link between any blockchain records and any personal data - if there's a way to recover or link personal data from a blockchain record, this is widely agreed to be incompatible with the GDPR. There are of course safe ways that platforms like this can operate, but I'd have expected to see it addressed, given in particular how central the GDPR is to this survey.

p12 c1 l3-17: Maybe it's not avoidable, but there are 10 all-caps instances of the word "SPECIAL" in a 15 line paragraph - any way to rephrase to reduce these? As I say, maybe it's not doable, it just stands out a lot.

p12-13: Section 3.3.6 Same question about blockchains and personal data as for ADvoCate.

p13 c2 l48: Even with only hashes on the blockchain, there can still be a GDPR risk (e.g., can a user be identified by the pattern of blockchain transactions belonging to an individual account, which could open an attack vector on hashes?)

p15 c2 l36-37: Maybe worth making explicit that the EU examples like Austria and Germany aren't automatically identical to the GDPR when it comes to data protection laws - there is scope for member state variation, largely, I believe, to allow member states potentially to have stronger restrictions.

p16 c1 l1: I might qualify this with "brought to light the concept of "informed consent" for data protection" - "informed consent" has thankfully been very high on the agenda in other spheres for a long time now. Not as long as it should have been, but a lot longer than for data protection.

p16 c2 l27: Strictly speaking, 'Model consent according to the GDPR' is too specific as a 'recommendation for modelling consent' - maybe 'according to the relevant legal context(s)'? (Of which the GDPR is obviously a very prominent example) Also, a semantic model for consent might want to have a wider net than data processing (clinical consent, for example).

p17 c1 l29-34: I don't understand the relevance of the example to the preceding sentence. Promoting feelings of trust and integrity isn't obviously relevant to a design being clean, simple, and colour-blind appropriate. (If anything, using colour to invoke feelings of trust sounds quite like a 'dark pattern', as described in the column directly adjacent.)

p18 c2 l39-44: Is there evidence you can cite for this claim?

p22 c1 l34-38: Common misconception that blockchain inherently means high computational costs - Bitcoin and (current) Ethereum do, certainly, but there are low computation forms of blockchain too.

Review #4
Anonymous submitted on 26/Jan/2021
Suggestion:
Major Revision
Review Comment:

This manuscript reviews the ontologies that purport to support consent for the European General Data Protection Regulation (GDPR). It also reviews a number of systems that use these ontologies to manage consent and provides some best practices for implementing GDPR consent using ontologies.

I think the paper needs more discussion of the modeling approaches and needs to be more conservative about the best practices it recommends to focus on the ones that are supported by current research.

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

I feel that this is a suitable text, although the topic of 8 ontologies seems very narrow, especially since the paper does not really introduce the representations used for each or what approaches they use. I was left wondering which ones would actually best fit the modeling approach I prefer (realist ontologies), but I will need to go to the individual papers to determine that.

Additionally, the best practices presentation was confused and confusing (see readability and clarity), and it's unclear how much of what was presented would be considered introductory, or is really settled research.

(2) How comprehensive and how balanced is the presentation and coverage.

The coverage is comprehensive and balanced in terms of the ontologies and systems presented.

(3) Readability and clarity of the presentation.

The sub-subsections in section 5, because they repeat, are very confusing, especially if the reader attempts to jump around in the text (as I tried to do). Many of the best practices are repeats across subsections, which made me feel like skipping them.

The "Color Theory" chart [1] suggested is very culture-specific, so tread carefully when recommending it. For instance, red in China has a very different (more positive) meaning than in Europe. I'm not sure this is an appropriate recommendation for a scientific paper.

(4) Importance of the covered material to the broader Semantic Web community.

GDPR consent has become a requirement for any web site that serves content in the EU, so nearly all web sites will need to address this issue. Since most sites simply provide a checkbox or acknowledgement of tracking, it isn't exactly clear how widespread the more detailed requirements discussed in the paper are. However, the availability of reusable tools and models may reduce the burden of compliance, and could help improve the application of GDPR consent approaches.

[1] https://graf1x.com/color-psychology-emotion-meaning-poster/