Review Comment:
This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.
Section 1.
The introduction is fine as far as it goes, but given this is a survey paper, it needs to say more than that the contributions are a survey: first there needs to be a motivation (or motivations) for the survey - some of this is implied in earlier parts of the intro, but this needs bringing out and developing, and second there needs to be clear statement of what the outcomes of the survey are. A third aspect is whether this is a partial or a complete survey. This may be tied back to the motivational aspect.
In the case of a survey paper, the reader needs to know what is the authors' methodology up front. This could be rolled into the introduction, but would probably be better as a stand alone section. Perhaps promote 3.1 to here. Also needs expansion, for example what criteria were used to include or exclude work? Some of this is covered at the start of 3.2. Provide some reflection on process too.
Section 2.
The comprehensive identifications of subject rights (2.1, 2.2), rights and obligations of controllers and processors (2.3) is extremely useful, but the section needs to start by explaining why it is being done and how it fits into the argument of the paper. In addition, the reader needs to be told what the origins are or what the derivation process is for these rights and information flows. Likewise the mapping of rights to information items (table 2) and of rights and obligations of controllers and processors to information items (table 3) refer respectively to chapter III and chapter IV of the GDPR, but miss a reference to the analysis that underpins the mappings.
Section 3.
The opening of this section is about other surveys, but the survey the authors are presenting comprises sections 3.2 and 3.3. I feel these should be separate sections at the top level, not part of related work. Then it makes sense to have an intro to the section on privacy-related policy languages that maps out what is to come, why it's in the order it is, etc., as well as flagging up the outcomes, such as the comparison tables, in advance.
The content provides plenty of description, but not much analysis, or at least variable amounts of analysis. Would be reasonable to omit (but state explicitly) in the case of obsolete languages - 3.2 identifies some obsolete languages that are left out (sic), but others are left in (need explanation). But for the obsolete, what are the takeaways? How have they changed the landscape, what have they influenced? Would an influence/dependency graph be possible to capture the flow of research ideas? It is scholarly artefacts like this that make survey papers genuinely useful.
Any particular reason for the order? Might be useful to make it broadly chronological, particularly since the plain citation style is unhelpful in this respect. At present it is just a collection of names without any apparent connections, which is what needs addressing.
3.2.2
ODRL constraints however are at the same level as everything else, making their specificity (which permission, obligation etc.) unclear, when there may need to be different, possibly conflicting constraints for different rules, or for the same rules at different times. I'm positive in general about ODRL, but as it stands ODRL 2.2 does have its limitations and arguably confusions.
3.2.3
"XPref resorts to XPath ... making the preferences formulation more user-friendly and less error prone." It is a little surprising to find XPath being described as user friendly and less error prone. Is there a user study whose results support this view that can be cited here?
3.2.4
precising -> making precise
How does the S4P data disclosure protocol for third parties work? What happens to data subsequently is a particularly tricky matter that is not mentioned in any of the other sections (I think).
3.2.5
"These annotations are then incorporated by the AIR reasoner in its justifications and can be used to hide PIIs present in the rule set." This is intriguing, but needs illustration to make the point: unless the reader knows the material, they cannot imagine how this works.
"Also, the rules graph format allows for the nesting of rules within the same rule set, thus providing a way to segment the conditions stated by the rule in order to only expose part of them in the justifications." This explanation works for someone who knows about it, but is opaque, at least to this reader.
3.2.6
to module distinct -> to model distinct
3.2.7
whom has access to what -> who has access to what
restriction abilities should apply -> restriction abilities apply (?); can't see a reason for should
3.2.10
Given title should 3.2.10 be in 3.3 rather than 3.2?
PROV-O needs citation (it's in the bibliography)
3.2.11
The DPF rule engine sounds quite complicated; needs some technological grounding: how does it work (e.g. on what logic is it based?)
3.2.12
Heavy use of bold here; the different presentational style distracts, compared to other sections where there is little or no bold.
3.2 general points
What is the descriptive coverage of each language? What makes one better than another for a given task? What are their formal underpinnings? For example, there seems to be limited consideration of the reasoning aspects.
3.3
The current intro is rather brief. Needs expansion to complement that of 3.2. As per suggestion above this would be a top-level section, nad as for 3.2 maps out what is to come, why it's in the order it is, etc., as well as flagging up the outcomes, such as the comparison tables, in advance.
Contrast also missing here, especially apparent with the GDPR-related ontlogies.
Have to motivate categories for table too; say more about the methodology; how the GDPR informs the process;
In general there is a sense of some entries getting more insightful coverage. Review for balance.
In the end, which ones are the right ones to use? How to choose between them?
Section 4.
blank section
section 4.1
content quite dense: needs signposting/structure; content mostly observations about the table: what are conclusions?
Reason for order in table 4? Order by number of asterisks, then alphabetically?
Likewise Table 5?
It feels like it ought to be a matter of concern that "Most of the ontologies and vocabularies presented are obsolete or without new developments in recent years, with BPR4GDPR's IMO, GDPRov, GConsent, DPV and GDPRtEXT being the only ones that continue to be improved.". Will the same fate befall those currently being maintained?
I could not follow what was intended by "Moreover, only DPKO, IMO and PrOnto do not have open and accessible resources.". Perhaps "of these", instead of "only"?
Not convinced by Listing 1. Feels like part of another paper and not sure it contributes much here.
The tables appear too late in the paper: table 7 even falls into the appendices. Since there appears to be a fairly clear break on line 11 on p.18 between the languages part and the vocabularies part, the structure of the paper could potentially be improved by putting the discussion (and the tables) for each survey part at the end of what is currently 3.2 and 3.3 respectively, so that the reader can more easily look at the tables and at the discussion together.
Conclusion
Proposing to combine three languages/ontologies is not very informative and misses a clear justification beyond maturity. What the principles underpinning each component and how will they satisfy the goals that the analysis/survey has brought into focus? Again, more reflection needed.
Bibliography
Entries 7, 37, 58, 71 need attention or are incomplete in some way.
Online docs need access dates.
|