Access Control for Linked Building Data based on Decentralized Role Paths

Tracking #: 1635-2847

Authors: 
Jyrki Oraskari

Responsible editor: 
Guest Editors ST Built Environment 2017

Submission type: 
Ontology Description
Abstract: 
Building data is produced and managed in a decentralized and fragmented way across different network-structured organizations separately established to the design, construction, maintenance, and renovation phases of a building. Linked Building Data is an ongoing effort to produce ontologies and tools to enable decentralized publication and granular online sharing of various types of building data as Linked Data. Before these technologies can be taken in practical use, proper access control scheme becomes crucial for reasons of security, privacy, competition, and IPRs. This paper studies a way to assign access rights to resources based on the complex relations of data producers and data consumers in relevant network organizations. The relations are represented as role paths from data to a consumer. A design and implementation of the approach is presented, based on the use of WebID for authentication and role paths for access control rules. An access control ontology incorporating role paths is presented, together with a draft of a domain ontology for construction projects.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Daniel Garijo submitted on 29/May/2017
Suggestion:
Reject
Review Comment:

This paper describes an ontology for defining security restrictions on Linked Data, as well as an algorithm for access control based on role paths. The author illustrates the approach with an example in the building specification domain.

The paper reads well and is relevant for the Semantic Web Journal. The topic is not novel, as using roles to access published resources has been addressed already in collaborative frameworks (e.g., wikis and forums); but I think the idea of being able to decentralize access control on web resources is interesting. That said, I don't think the paper is ready for publication in its current condition, specially from an ontology perspective. I list my rationale below:

- The contribution of the paper is unclear: On page 3 the author introduces the research question, which is to create a "scheme based on role paths in a decentralized setting". Then an algorithm is introduced on section 4 as well, and the evaluation is oriented towards validating the algorithm rather than the ontology. I found the contribution (ontology + algorithm) almost by the end of the paper, which is very confusing.

- The paper motivated the work for the Linked Building Data domain. However, the examples look artificial. The whole validation of the model is based on a toy example. How does this adapt to reality? Is there really a need for this kind of work? Aren't there any real examples to illustrate the approach? How is the ontology relevant for the domain?

- The ontology described is not properly published: The shortened URL used in the paper resolves to a github link. In that link the following base URI is used: https://drumbeat.cs.hut.fi/owl/security.ttl, which is not accessible. The ontology is not documented, or derreferenceable. Since it is listed as one of the contributions of the paper, I find this unacceptable. As it stands at the moment, the ontology cannot be properly used by anyone.

- The ontology presents some problems as well. The author references some previous use cases as the main driver of the ontology. Nevertheless, the requirements and methodology for ontology development are not described in the paper. Figure 2, which is not explained in the text, presents an overview of an ontology. But the github link where the ontology is published includes additional descriptions and is referred to as "Example Ontology" in the paper. I think the author might be mixing the (example) extension of the ontology for the building domain with the proposed ontology itself.
The ontology also lists some weird design decisions, such as considering architects a type of company, instead of persons. Since none of the classes and properties are properly defined or have examples, all I can do is try to interpret the figures in the paper.

- Section 5 is called evaluation, but it's a brief description of the implementation and a discussion including the differences with related work. Has the implementation been evaluated by any user in the Linked Building domain, for which it was designed to? Does the ontology fulfill all the requirements for which it was designed? Is there any metric to compare the current approach with related work? None of these questions are addressed in the paper! Instead, the author just leaves a series of questions unanswered.

- Related work: there has been many related work on the area, but after reading the section it is not clear why is it not enough to address the problem at hand. Maybe moving it after the implementation section would help to compare it against the approach proposed by the author.

- How does the author propose to adress the questions raised at the end of the paper?

Minor issues:

- Wouldn't having a decentralized access control create a security risk? Wouldn't having this decentralized approach also includes accessibility risks? (e.g., what would happen if an intermediate server is offline?)

- The producer has to know how third parties organise their data. What would happen if they change their schema?

- Figure 1 is confusing: what do the dashed arrows mean? Why are some ovals stacked? Why are "project" and "facility" boxes dashed?

- I can't understand some paragraphs. For example, Page 5 "So, the project description can be located somewhere else on the web". What does this have to do with ontology reusability?

Page 6, second paragraph: rdf:properties belongs to the reserved vocabulary of owl 2 and must not be used to identify classes? What does identify mean in this context?

- The examples should be published online too.

- Figures 4 and 5 are not explained.

Review #2
By Raúl García-Castro submitted on 29/May/2017
Suggestion:
Reject
Review Comment:

The paper presents an access control approach for Linked Data; it is composed of an ontology and an access control algorithm. The authors have applied the approach to a use case dealing with Linked Building Data.

The topic is highly relevant, but the paper should be more detailed and formal, with the writing and the presentation significantly improved. Furthermore, it is not clear for me whether the paper fits into the special issue, since the contribution is for Linked Data in general and the only specific thing is that the use case in which it has been applied deals with building data.

In page 2 the author states that "The advantages [of linked data] are that data is online and up-to-date". Following a Linked Data approach does not ensure that the data are updated; it just deals with how to represent data.

In the introduction, the paragraph about the DRUMBEAT platform should be better related with the surrounding paragraphs.

Section 2 can be improved by better structuring it. Now it jumps along different topics: identifiers, authentication, Solid project, access control paradigms, RBAC, access control ontologies, one access control approach, ontology for access control, another approach.

Apart from this, section 2 is about related research but it includes nothing specific to building data.

Besides, the comparison of the author's contribution with the state of the art is just a sentence: "Our solution differs in the way that instead of using WebID extensions, we are using RDF property paths to express the authorization roles". This must be improved.

The authors should make a proper comparison with the current approaches, going further into the literature (regarding access control, the work of Victor Rodriguez comes to my mind, e.g., http://oeg-dev.dia.fi.upm.es/licensius/static/ldr/).

Hence, the related work section should provide an overview of what is needed to implement access control, and how the different approaches cover the different aspects. Not just compare each approach according to one aspect. The section should also separate access control theory from concrete proposals.

Section 3 directly jumps to the ontology but the need for such ontology is not motivated. This should be included in the paper.

Besides, the ontology reuses existing work [10,21] which is not mentioned in the related work section, so it is not clear the contribution in that sense.

Section 3 states that "In DACLBD, it is presumed that data that describes the resource coexists in the same RDF store". This is a strong precondition since it does not allow access control data to be distributed. The author should discuss this in the paper.

Section 3 just presents the raw ontology code and a figure without explaining how it has been designed or the design decisions taken in it.

It is not clear why in the ontology the author does not reuse the RDF vocabulary terms to define lists and instead he creates new vocabulary terms. This should be explained.

The author defines a property path as a composition of rdf:Properties. However, in OWL there are datatype properties and object properties, which are disjoint, so not every property path is possible. The author discusses this at the end of section 3 but the solution is just a trick; it should not be used (even if represented as a string).

Page 5 introduces the term "project description", but it has not been presented before in the paper.

In the example in page 5, it is not mentioned why resource representations mix data and the associated rules; they may not be together.

The description of the rules is quite informal, through examples, the behaviour should be described formally, clarifying all the potential scenarios; for example, what happens if a system has conflicting rules?

The paper does not include evidence of the quality and relevance of the ontology.

In section 4, it is not clear which are the different components that participate in an interaction. Besides, it is not clear how interactions are secured.

Algorithm 1 mixes pseudo-code and natural language. It should include more pseudo code and move explanations and comments to the text.

The evaluation section (section 5) doesn't include any evaluation. It just describes how the system was implemented and compares the solution to related work. This should have been done in the related work section and this section should include some evaluation to validate the author's approach.

Other issues in the paper follow next.

The writing and the formatting of the paper must be reviewed.

In page 4 reference [21] is cited as being Di et al.'s work, but it is not the case.

Figure 3 contains mentions of "Contex", shouldn't this be "Context"?

Some listings in the paper include captions, and other don't.

URIs and software tools are not research references.

Review #3
By Simon Steyskal submitted on 15/Jun/2017
Suggestion:
Major Revision
Review Comment:

=== Summary ===
The present article introduces a RBAC model that utilizes SPARQL property paths for defining roles in terms of so-called role paths. Together with an access control ontology called Decentralized Access Control ontology for Linked Building Data (DACLBD) and accompanying example use case scenario, a brief introduction into the underlying envisioned access control algorithm is provided. Eventually, the author concludes his article by pointing out that proposed approach raises many questions which - unfortunately - are left unanswered.

=== Review Overview ===

(You can find the detailed review underneath)

In general, I think that present article discusses some very interesting topics especially when it comes to utilizing SPARQL property paths for access control.
At a closer look, however, I've doubts about the article's claimed originality as well as issues with its overall quality. The discussion/comparison wrt. RW is very superficial and misses relevant work. Additionally, there are various parts throughout the entire article that are either hard to follow as they are lacking proper explanations and/or contain typos/formatting issues that could have been easily detected if it would have been proof-read more thoroughly.

In the following, you find detailed remarks for each section:

=== Detailed Review ===
------------------
0) Abstract
------------------
0.1) "proper access control scheme becomes crucial for reasons" -> "it is crucial to establish proper access control mechanisms for reasons"
0.2) "This paper studies a way to assign access rights to resources" -> what kind of resources are you talking about? resources as in rdfs:Resource (cf., Fig2)? or resources as defined on p.3 "resources are represented as calls to a REST API,"? clarify!

------------------
1) Introduction
------------------
1.1) "resulting in multiple BIM models" -> BIM models = Building Information Modeling models, whereas Building Information Models = BIMs
1.2) "data have become available outside a facility itself but still directly relevant to it:" -> that's weirdly phrased.. is data from "outside a facility" meant to be "external data"? rephrase
1.3) "Building data is produced by ...: owners, cities/municipalities, designers, engineers,...,etc." -> either you enumerate *all* types of parties OR you just list a few of them (3-4) and indicate that this is just an excerpt of potential types of parties. You should not do both.
1.4) "The parties are typically in their respective roles" -> which means? party == role? what's the difference? can a party have multiple roles and vice versa?
1.5) "the datasets produced by different parties are strongly interrelated." -> this implies that datasets produced by the same party aren't strongly interrelated. is that the case?
1.6) The whole paragraph starting with "Rather, the borders between organizations and disciplines, ..." -> there's a lot of ongoing research (esp. in the area of data integration) dealing with the very same issues you outline in that paragraph. You should have a look at those efforts and cite related ones here.
1.7) "DRUMBEAT platform is an implementation of Linked Building Data concept .." -> either put the reference to [45] directly after DRUMBEAT and/or add a link to the platform as a footnote
1.8) "..with a Representational state transfer (REST)[14] API.." -> s/Representational state transfer (REST)[14] API/REST API/
1.9) "..to access the objects" -> which objects? s/the objects/objects
1.10) "an architect could grant access to her model to all employees of its consultants," -> s/its/her ?
1.11) "all managers of the all subcontractors of the main contractor of a project associated with a resource." -> what? what's resource referring to? s/of the all/of all
1.12) "The research question of this paper is whether it is possible to have .." -> to which the answer is either yes or no. It's about the "How" not "If".. rephrase!
1.13) "to express the complex relations between a data producer (who has also specified the access control rule)," -> data producer isn't necessarily also the owner of the data
1.14) "and the potential subjects, or the set of allowed data consumers." -> what's the difference between subject and data consumer? isn't every data consumer also the subject?"
1.15) "As a decentralized authentication mechanism we use WebID, although other options would be available as well." -> add [34] to WebID (from sec.2); what other options? cite them or remove it.

------------------
2) Related research
------------------
2.1) "a role may inherit permissions from parents in a role hierarchy" -> s/parents/parent roles/
2.2) "Since access control decisions happen between multiple entities" -> what does that mean?
2.3) "..there is an obvious need for shared vocabularies." -> why's that obvious?
2.4) "W3C ACL-RDF [2] is a vocabulary that defines the access rights" -> s/defines the/expresses
2.5) "FOAF Vocabulary Specification [4]." -> you've already introduced FOAF on the previous page
2.6) "The RDF data authenticity and trust were not covered." -> what?
2.7) s/SPQRQL/SPARQL
2.8) "Furthermore, Villata et al. [8] " -> Costabello et al.
2.9) "SPARQL ASK queries are used to test if the user can be granted rights to access" -> at that point rights are already fixed. you can check whether a user *has* appropriate rights to be *permitted* to access requested resource.
2.10) "The works concentrate on SPARQL endpoint while our proposal focuses on linked data API." -> what's a linked data API? are SPARQL endpoints and APIs comparable?
2.11) "Costabello et al.[9] extended the work to shield the access to a Linked Data Platform" -> their approach is called Shi3ld, but what does "to shield the access to a LDP" mean?
2.12) http://www.semantic-web-journal.net/content/reasoning-data-flows-and-pol...

------------------
3) The Ontology
------------------
3.1) "for co-operative parties to access object entities in them. " -> what's an object entity?
3.2) "The model users could be, e.g. the site manager of the project.." -> model users==parties?
3.3) "The presented Decentralized Access Control ontology for Linked Building Data (DACLBD)" -> at that point no ontology was presented yet
3.4) "that was presented by Di et al. in [10,21] when reasonable." -> [21] is Kirrane not Di et al.
3.5) "In DACLBD, it is presumed that data that describes the resource coexists in the same RDF store." -> .. as?
3.6) "An RDF property path is written as a list of RDF: Property objects are used to express the group of trusted people" -> what? formatting/rephrasing!
3.7) Figure2
3.7.1) some entities have namespaces, some don't -> fix
3.7.2) it's rdfS:sub(Class|Property)Of not rdf:
3.7.3) from [a]: "All things described by RDF are called resources, and are instances of the class rdfs:Resource. This is the class of everything. All other classes are subclasses of this class." -> which allows everything in Fig2 to have ACRules, etc.. intended?
3.8) "The access rights of the ontology" -> it's not the access rights OF the ontology, but the ones that are expressible with the ontology
3.9) "In Figure 3, the detailed Turtle listing" -> no need to emphasize that it's turtle
3.10) Fix formatting of Table1
3.11) Figure3
3.11.1) "@prefix : " -> don't use URL shortener; without # or / something like :Write becomes http://tiny.cc/r3oikyWrite
3.11.2) :Write/:Read/:Append are rdfs:subClassOf :Permission not rdf:type!
3.11.3) ":Append a [sic!] :Permission" -> is missing a dot
3.11.4) :first doesn't have an rdfs:range
3.11.5) "Fig. 3. The Turtle listing of the ACLLBC ontology" -> what's ACLLBC?
3.12) "This basis keeps the rule and the rule interpretation, i.e. the metadata, close" -> what does that mean? when's a rule and its interpretation considered to be "close"?
3.13) "and makes the derivation of the interpretation context simple." -> why's that?
3.14) "The resulted node paths could contain references to any ontology, but they are expected to realize the SPARQL property path pattern:... reflects an authorization of the data." -> what? what are "node paths"? what's the difference between property path/RDF property path/SPARQL pp/node path? also formatting!
3.15) "One core design principle has been the re-usability of the entities of the ontology." -> How's that design decision reflected in the ontology? what is it that fosters re-usability?
3.16) "So, the project description can be located somewhere else on the web." -> what? "located somewhere else on the web" as in different namespace? How's that related to 3.15)?
3.16) "An ontology that shows typical entity classes found in the construction-related uses cases.." -> what are entity classes? are properties like :hasProject also considered to be entity classes? s/found/used/; s/in the/in/
3.16a) Figure4
3.16a.1) missing namespaces
3.16a.2) it's rdfS:sub(Class|Property)Of not rdf:
3.16a.3) fix weird stacked arrow from :ConstructionProject to :Contractor
3.16a.4) missing rdfs:subPropertyOf relation between :hasContractor and :hasMainContractor
3.16b) Figure5
3.16b.1) "\begin{lstlisting}[frame=none]" -> fix
3.16b.2) :hasManager inherits its domain&range from :hasEmployee
3.16b.3) why not use two different namespaces, one for your DACLBD ontology and one for your example domain?
3.17) "how the ontologies can be used together in the context of the basic use case." -> what basic use case?
3.18) s/Let us expect/Let us assume/
3.19) "https : //architect.org/music_hall" -> formatting
3.20) "ds:hasProject/ds:hasContractor/ds:hasEmployee," -> that namespace was neither introduced nor is it used anywhere else in the paper; remove ,
3.21) "In this case, the contractor is Fabricator Ltd. and they have a site: https://fabricator.org." -> so..? what should that tell us? https://fabricator.org is the contractor's foaf:homepage? or that https://fabricator.org is the URI of the contractor? for the latter, "C has a site: xyz" does not imply "xyz is the URI of C"
3.22) "Thus the rule and the metadata would be written using the following triples." -> written to where?
3.23) Figurex
3.23.1) Why does every listing have a caption except that one?
3.23.2) formatting
3.23.3) vs. https://fabricator.org -> http vs https
3.24) "to correspond to OWL 2, and OWL 2 DL profiles." -> why OWL DL? elaborate!
3.25) "On the other hand, RDF:Property belongs to the reserved vocabulary of OWL 2 .." -> as opposed to? what's the "other hand"? fix formatting of "RDF:Property"
3.26) "Therefore, if needed, the RDF:property values may be converted to a String literal that contains the SPQRQL notation of the property path" -> what? what are you trying to say with this? what rdf:Property values may be converted to Strings? fix formatting of "RDF:property"; s/SPQRQL/SPARQL/
3.27) The whole paragraph starting with "What applies to the security" -> relevance? what's the main message? rephrase!

------------------
4) The Access Control Algorithm
------------------
4.1) "briefly.It" -> space
4.2) "It is an enabler that specifies,.." -> an enabler for what?
4.3) "In particular, it makes it possible to create an implementation to evaluate the design." -> what? which design? relevance/context?
4.3a) Algorithm1
4.3a.1) line7: what's an RDF node path?
4.3a.2) line9: triples matching to what?
4.3a.3) how do you deal with cycles?
4.4) "at the end of one of the role path" -> s/path/paths/
4.5) "she can be granted the permissions that are associated with the professional role of the access control rule." -> what professional role? what's a professional role of an acr? what's the difference between a professional role and other types of roles of acrs?
4.6) "When they send authorization requests to test, if a user is indeed in the role of the access control rule," -> what? how and when can a user "be in the role of the acr"? any role of an acr? or only the professional role of an acr? can an acr have multiple (professional) roles?
4.7) given its significance, I've the feeling that section 4 is way too short and "handwavy" -> please extend, (e.g., by elaborating on the envisioned request/response process, etc.)

------------------
5) Evaluation
------------------
5.1) "enforcement solution is given and evaluation against related works is given." -> given x 2, rephrase!
5.2) "One of the core design principles was to add the user authentication as an added security layer to the system to keep the existing Web of Building Data" -> what system? why and how would you lose "the existing Web of Building Data" if you would've integrated user auth. differently? is your ac mechanism also part of that layer?
5.3) the two paragraphs starting with "Java API for RESTfulWeb Services (JAX-RS)" and "In the first implementation," respectively -> nice to know, but (imo) not relevant for a journal article (but for a technical report!)
5.4) "The rights to access data depend crucially on the professional role of the consumer to the producer of the data." -> what? what does that mean? what's the relation/difference between "professional roles" of consumer/producer of data/access control rule? also producing data does not imply ownership.
5.5) " Unlike related works," -> what related works? add \cite
5.6) "the presented ontology focus on" -> s/focus/focuses/
5.7) "expressing the professional role, the chain of trust a company has for employees of a contractor and contractor’s subcontractors." -> is that the definition of "professional role"? if yes, how does this align with 4.5)?
5.8) "In operation, this means truly loose coupled and distributed access control." -> because ..? why does "expressing professional roles" mean "truly loose coupled and distributed access control"?
5.9) "There were two possible solutions:" -> possible solutions for what? what's the problem?
5.10) "the data guard could directly contact all co-operating companies directly," -> what's a data guard? remove one "directly"
5.11) "or the checking work could be delegated to the contractor." -> what "checking work"? who checks what?
5.12) "If the companies communicate only with only directly known parties, only " -> 3x only
5.13) "So, delegating the check of the members of the role may be the only way." -> to do what?
5.14) "The recursive delegating the work of trust checking does not exist in the related work." -> I beg to differ.. a quick google search prompted me with e.g. [b] (cf. page 4); also phrasing.
5.15) "Contrasted to the WAC" -> what's that? "WAC" was never introduced
5.16) "A role path specifies a set of people that have a common professional role of the consumer to the producer of the data" -> what? related to 4.5) and 5.7)
5.17) "The role list is a policy language that specifies the role." -> what's a "role list"? it's definitely not a policy language.. if a role list is a list of roles and role lists specify roles, are the roles they specify the same kind of roles they are composed of?
5.18) "but it also raises questions: " -> so ... ?? where are the answers? what's the purpose of raising questions without actually discussing them?
5.18.1) "How should we handle the situation when the public key of the client certificate changes?" -> based on what's discussed in this article so far, I've absolutely no clue why/where/how that would be an issue. You can't just throw things like that in the mix without explaining/discussing them.
5.19) You might want to consider renaming this section from "Evaluation" to something along the lines of "Discussion"

------------------
6) Conclusion
------------------
6.1) "The management of the distributed security model can be studied in detail. Ways to ... can be studied" -> they sure can, but are they also worth to be followed? Are you planning to continue as outlined?
6.2) "The presented access control framework is inherently distributed and modular. This raises many questions." -> again, if you already know that your approach raises those questions, why haven't you addressed them? are you even planning to address them in future work? is the take-home message that your proposed approach raises many unanswered questions?

------------------
A) References
------------------
A.1) Some references would be better served as footnotes, e.g. [2,15,17-19,27,34,39,40] (some of them are missing their respective URLs too)
A.2) [41] -> s/Torma/Törmä/
A.3) [37] is outdated

== References ==
[a] https://www.w3.org/TR/rdf-schema/#ch_resource
[b] Li, Ninghui, Benjamin N. Grosof, and Joan Feigenbaum. "Delegation logic: A logic-based approach to distributed authorization."
ACM Transactions on Information and System Security (TISSEC) 6.1 (2003): 128-171. http://www.cs.yale.edu/homes/jf/LGF.pdf
[c] http://www.semantic-web-journal.net/content/reasoning-data-flows-and-pol...


Comments