Privacy in Ontology-based Information Systems: A Pending Matter
Review 1 by Michel Dumontier:
This well written article addresses the issue of privacy of ontology-based systems. After a brief introduction consisting of illustrating use cases in the medical domain, it identifiers two key challenges : 1) to develop privacy theory in ontology-based information systems and 2) to understand how access restrictions can guarantee privacy.
Although systems that use access policy are widespread, it was very interesting to see how policy violations may occur not only from a single query against restricted knowledge, but that by composing a sequence of seemingly innocuous queries, it might be possible to violate the policy. To formalize (and keep track of) this background knowledge in such a way that policy violations could be detected is certainly an interesting research problem.
My only critique of the paper is that it states that "... they have also left many open problems and further research is needed before they can be incorporated in practical systems." with the justification of "Due to lack of space". I really believe that this needs to be clearly articulated, if only briefly. Early components (1,2) of the paper can be reduced or the technical details in 4 omitted to provide the space required.
Review 2 by Paulo Pinheiro da Silva:
The short paper is very interesting although the content needs to be presented in a more rigorous way and probably further reviewed by experts in privacy. The motivation lacks a more extensive review of privacy literature (as opposed to semantic web literature). The proposed solution for ontology-based privacy is based on assumptions that may not be easily accepted by privacy researchers. These issues are elaborated below.
In the background section, the author mentions that users can indirectly retrieve information via logical inference. This notion of indirect retrieval of information needs to be clearly defined since many are possible ways of combining information retrieval and information derivation and many of these possible combinations are not discussed in this paper. Later in the Background section the author starts to elaborate on this with the John's example. However, the example makes the understanding of this notion of indirect information retrieve even more obscure.
The term 'certain answers' needs to be defined because it appears that the author assumes answers to be certain even if they are based on uncertain data (e.g., most clinical data).
In the general challenges, the author uses the term ontology as a synonym for knowledge base. This is fine for me but certainly not okay for other readers. In a more restrict interpretation, ontologies would not contain most of the statements supporting facts such as that that Bob, John and Dr. Andrew are of type person. With that in mind, phrases such as 'data of the ontology' would not make sense.
A more relevant comment for this section is the fact that the author is exposing privacy issues related to knowledge derived by deduction. This is the natural first step for a privacy study in the semantic web context. However, in the privacy community, the real issue is the exposure of knowledge derived by induction and the literature in this field is extensive. This is exactly the point where this paper lacks a stronger connection with state-of-the-art work on privacy.
- (throughout the paper) Extensive unnecessary use of the 'the' article. For example, in the abstract, we could have "preventing unauthorized access to data and knowledge in ontologies" instead of "(…) to the data and the knowledge in the ontology";
- (section 1) "OWL ontologies can be used to process data" - I am not sure ontologies can process data;
- (section 1) "could be disastrous" – how?
- (section 2) "secret information" – replace by "restricted information"?
- (section 3) "the design a privacy preserving" – add 'of'
- (section 3) Consider related work on partial import both at the owl spec and linked data