The Yin and Yang of Privacy and Recommendation

Tracking #: 1361-2573

Nuno Bettencourt
Nuno Alexandre Pinto da Silva
João Barroso

Responsible editor: 
Guest Editors Linked Data Security Privacy Policy

Submission type: 
Full Paper
Recent statistics show that privacy is vaguely understood by people. Yet, users' perception is that social media is the least secure method for sharing private information. Keeping user resources private should always be a user option. Nevertheless, when users do not share any information their expectancies of having better search results or recommendation are lowered. Furthermore, is might suppress or remove the serendipity of results thus not allowing users to collect or gain more knowledge on areas related to those they are interested in, but are not directly related to or easily searchable (known-unknowns or unknown-unknowns). This work addresses several issues between privacy and information recommendation and how to establish the yin and yang of both. It proposes an architecture that provides means and support for publishing resources in a private manner, hereby making websites behave like meshes of dereferenced resources from different web domains, yet complying with the established access policies. Users resources are kept private, yet reachable and discoverable until full authorised disclosure is requested to the user.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/May/2016
Major Revision
Review Comment:

his paper really has two parts. The first, in which the authors do a fairly good job, is more like a review paper of the current state of the art and practices for access control of web resources.

The second part is where they propose a new way of doing access control (or cross domain sharing).

They 'evaluate' their proposal on a recommendation task, which is a bit odd. The real tests of whether this system will work are whether it is easy to deploy, whether users understand it and how vulnerable it is to different kinds of attacks. Before the paper can be accepted, such a discussion/analysis must be included.

It would also have been really use to have at least one fully worked out example illustrating their proposal. Without that, it is very difficult to follow that section of the paper. I strongly recommend that if the paper is accepted, the authors should be asked to augment the paper with such an example.

Review #2
By Marie Joan Kristine Gloria submitted on 11/May/2016
Minor Revision
Review Comment:

The paper, "The Ying and Yang of Privacy and Recommendation, offers an ambitious project which attempts to reconcile several issues plaguing published web resources and privacy. The authors clearly outline their motivation and pending issues -from the need for dereferencing, decentralization, context-preservation, etc. Moreover, I appreciate the authors' discussion on the technical challenges re: cross-domain resource sharing as well as the breakdown of AAA architecture. One point of clarification needed is whether a user can overwrite access policies through PAP even if it may violate his/her own privacy. The authors' may also want to revisit section 3.3 re: the "location awareness vs. knowledge awareness" in context of its placement within the paper itself - Is it really necessary to include in order to justify the need for recommendation systems? In general, there remain two larger questions that I would appreciate comments on: 1) in implementing action sensors -what are the authors' thoughts on the tension between the need for more data for better recommendation systems versus privacy; 2) are there thoughts on how to "correctly" choose an ontology for deriving implicit semantics - as done with the LastFM dataset - for instances that may be less domain specific. Lastly, there are a few grammatical errors and run on sentences (e.g. 2nd paragraph of abstract; "Furthermore, is. . ") that need to be reviewed and corrected.

Review #3
Anonymous submitted on 18/May/2016
Review Comment:

The article presents a framework for decentralised social resource sharing as well as an experiment using a recommender system.

Quality of writing:
The article has a very weak presentation, as neither the research question nor the research contributions are cleary described and enumerated. The experiment and evaluation of the paper is disconnected from the proposed architecture.

The originality is severly lacking, as the presented approach is not ground in established terminology or existing publications. The authors do not compare their approach to existing industry standards such as OAuth 2. The first two sections (Introduction and Context) only use 3 citations, which shows a strong lack in properly grounding the motivation. Most of the cited research is relatively old considering the speed at which research in this area moves (about half of the cited papers are from the early 2000s). In particular, the authors are not aware that FOAF+SSL was renamed into WebID.

In addition, two lists of requirements are used to inform the proposed approach in Sections 2.1 and 4, however these requirements are not properly grounded, e.g. in a literature survey or in an (empirical) requirements analysis.

Significance of results:
The results are not significant in any way.
In addition, the abstract, introduction and conclusion make claims which the paper does not support.

The proposed architecture is derived from a so called Authentication, Authorisation, Accoutability (AAA) conceptual architecture from 2000. The only extensions presented by the authors are textual descriptions of how to extend the functionality of the individual components, in Section 5. The extended functionality is not described in a formal or semi-formal way, and not implementation is presented.

The recommender system evaluation uses a subset of the Hetrec 2011 data set. The implementation uses Apache Mahout. The only functionality which is implemented by the authors in order to extend Apache Mahout is a so called semantic similarity provider, which is mentioned for the first time in the middle of section 7. No details for this semantic similarity are described neither formal or informal.

It is not clear how exactly the proposed architecture enables personalisation while protecting the privacy of the user at the same time. What is the threat model the architecture addresses? What kind of personalisation algorithms are able to work in such an architecture?

Detailed feedback:

1.) Introduction:
* Why are neither privacy nor recommendation mentioned in the introduction?
* No research problem is introduced here.
* The contributions are not enumerated.
* Why is the introduction not ground in more related work and other relevant citations?

2.) Context
* Why is privacy mentioned for the first time on page 3 ?
* Where do the "different methods for users to share resources" come from? Citation?
Or provide examples, e.g. is this how Facebook operates?
Or describe at least a use case.
None of these are provided.
* In particular the "long, psuedo-undecipherable URIs" bullet, suggests a lack of understanding for the subject matter. These URIs usually require cryptogtraphic tokens to be accessed, either as a URL parameter or in the header, so it is not correct that they URI itself protects the resource by adding a layer of obfuscation. Even if such a URI is intercepted, it can not be used to actually access the resource without the token, e.g. in OAuth 2.
* Where does the list of challenges from 2.1 come from? There are no citations and no requirement analysis.
* About 2.1.1 weak cross-domain security. Facebook provides this. For instance, every click on a like button is using strong cryptographic tokens to authenticate that this indeed was caused by the correct user. So a different grounding of this "challenge" needs to be provided.
* The goal described in section 2.2 is not formulated in a clear and concise way. In addition, it does not use concept names which point to other parts of the paper

3.) Background knowledge
* In section 3.3 a categorisation of user awareness of resources is presented, which has 4 quadrants: known-unknowns, known-knowns, unknown-knowns, unknown-unknowns. The authors then state that "recommender systems are conceptually fit to help users perceive resources as useful known-knowns". That makes no sense. If the item is not known then at least on of the two adjectives needs to be "unknown".
* In section 3.4, please cite a paper on your classification of recommender systems approaches.
* The paragraph with "there are different resons for these perceptions, including ..." needs to be significantly expanded. It is totaly unclear to this reviewer.
* How does business logic fit into the description of recommender systems ? E.g. Amazon also tries to optimise sales.
* What is the impact of access policy restrictions on the recommender system?
This is only hinted at in this section. The explanation is much to short. Alternatively provide a citation.
* The sentence following citation 30 seems to forget that the system facing information overload is usally the user itself. The recommender system / algorithm is not really the victim of information overload.

4.) Discussion
* Normally a discussion is presented after the contributions, towards the end of the paper.
* However, this is not a discussion but a list of requirements which supposedly are the basis for the presented architecture.
* Where do these requirements come from?
Why is there only one citation in this section?
* The authors have to derive the requirements from somewhere, and they need to describe this process. Literature survey, use case and requirements analysis, industry project, or something else.
* The discussion of multiple identities (4.1) requires more grounding. There is evidence that users actually prefer fragmented identities.
* In section 4.3 OAuth is mentioned the only time in the paper. So the authors are actually aware of it. The proposed architecture has to be compared to OAuth. Also, the reason for stating this requirement is not clear.

5.) proposed architecture
* it is not clear how exactly the proposed architecture enables personalisation while protecting the privacy of the user at the same time. What is the threat model the architecture addresses? What kind of personalisation algorithms are able to work in such an architecture?
* The authors refer to a "typical access control architecture". Please provide a citation or a detailed use case and requirements analysis.
* In 5.1 FOAF+SSL is referenced, which has been renamed into WebID.
* In section 5.6 an "access policy defintion language" from the W3C is mentioned without calling it by its name.

6.) Experiments
* How are these experiments related to privacy?
* How are the experiments related to the rest of the paper?
* The criteria for selecting the subset of the Hetrec 2011 data set have to be listed.
* The text refers to the "evaluation needs", these have to be clearly listed.
* Which tool / approach was used to reconcile the LastFM Tags ?

7.) Evaluation:
* From the description of section 6 and 7, the only extension which the authors implement on top of Apache Mahout is the semantic similarity provider. It is mentioned for the first time on page 17, where configuration C105 is described.
* What exactly does this semantic similarity do? Did you develop it by yourself?
Is it using an approach from existing research?
* How is the experiment described in this section relevant to privacy?
* Are these results statistically significant ?

8.) Conclusions and future work
* The authors claim: "... this work demonstrates that it is possible to achieve a balance between privacy and information recommendation, with minimal trade-off between both." As the presented experiment does not consider privacy at all, you can not support this claim.
Also the trade-off is not quantified.
* The conclusions mentions that an implementation of the architecture exists, however this implementation is not described anywhere in the paper.
* The conclusion tries to explain that the contributions of the paper enable balancing privacy and personalisation. This explanation needs to be expanded, and it needs a formalisation. The paper does not currently support this claim.
* The conclusion claims that "this system has been fully tested on a closed ennvironment web server". However no details of this implementation are actually described in this paper.
So the authors can not support this claim.