LBDserver - a Federated Ecosystem for Heterogeneous Linked Building Data

Tracking #: 3032-4246

Authors: 
Jeroen Werbrouck
Pieter Pauwels
Jakob Beetz1
Erik Mannens

Responsible editor: 
Guest Editors SW for Industrial Engineering 2022

Submission type: 
Full Paper
Abstract: 
As the application of Linked Data technologies in the Architecture, Engineering and Construction (AEC) industry gains momentum, prospects for an interdisciplinary, Web-based BIM practice become more and more realistic. Although several modular Linked Building Data (LBD) ontologies have already been developed and interlinked to address specific topics in digital buildings, infrastructures to actually test and use them in an accessible, `BIM-like' fashion, are scarce. In this paper we propose the LBDserver, an extendable web-framework for management of federated, heterogeneous (building) project data. In contrast with current-day centralised Common Data Environments, this decentralised solution is stakeholder-oriented, where disparate project information resides with the stakeholder, to be shared in a fine-grained way with other consortium members, and linked with other heterogeneous datasets on the Web. We combine the Solid initiative for Web decentralisation with the recent industry standard \textit{Information Container for linked Document Delivery} (ICDD), proposing a `federated CDE' infrastructure for serving heterogeneous AEC project datasets in a federated, access-controlled and scalable manner. We validate the proposed framework for completeness using an existing project in Ghent, Belgium.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 31/Mar/2022
Suggestion:
Minor Revision
Review Comment:

AEC industry is one of the most fascinating and challenging domains for semantically-rich decentralized information management, due to the complexity and variety of built environment, fragmented nature of projects, the long lifecycle of built assets, and the complex security and privacy concerns in living environments. Due to the exceptional inefficiencies in construction industry, proper digitalized solutions have a high potential to improve productivity, quality and energy efficiency, as well as to reduce waste, ultimately decreasing the climate impact of construction.

There has been significant previous research within the Linked Building Data community on semantic interoperability, focusing mostly on Semantic Web technologies, ontology development, and reasoning applications. Despite the obviously decentralized information production and consumption by tens or hundreds of companies, changing from project to project, the research on compatible information sharing solutions has been lacking. This paper makes an important contribution by proposing a solution based on Solid Server, an advanced and practical decentralized platform. It goes to the level of detail where the difficulties and complexities of decentralized management will ultimately be revealed.

Criteria:

1. Originality: The paper presents definitely original research that combines Linked Building Data modelling technologies with publication of decentralized data of building projects on Solid Server that is a promising solution concerning the nature of data in the AEC industry.

2. Significance of the results: The decentralized information management solution outlined this paper could potentially have a revolutionary impact for AEC industry. Conversely, the complexities in and experiences from challenges of the AEC domain can benefit the development of decentralized data publication practices in general.

3. Quality of writing: The presentation could be simplified and better organized for readers not previously familiar with AEC domain or Linked Building Data research.

Detailed remarks:

- The scope covered in the paper is fairly large and its content is quite complex, especially for readers not familiar with Linked Building Data beforehand, or more generally, for those without previous understanding of AEC domain and Semantic Web technologies. The paper would benefit from pruning of unnecessary contents and from somewhat more elementary description of the core contents.

- The overall presentation should be targeted more towards the general Semantic Web community, or given the title of the special issue, to readers that additionally have only general background in industrial engineering. Especially the Introduction uses many concepts such as BIM, IFC, ifcOWL, Linked Building Data ontologies, CDE, ICDD, partial models (including disciplines such as HVAC) withhout proper introduction, which can make the text challenging even for people with some previous background in AEC. The concept of "double patchwork" should be made more visible, and the problems of IFC that Linked Building Data ontologies are supposed to solve should be described in more detail. It would be better if the essential content of Fig 1 (the BIM maturity wedge) could be redrawn or summarized in other ways; there are too many unexplained acronyms and terms in the figure, and the ten-year old figure already benefit from some re-evaluation. On the other hand, in this forum it is not necessary to explain what Semantic Web or ontologies are in general.

- Research objectives should be presented in more explicit manner in 1.4 (like in the end of the Abstract). The requirement 3 in 1.4 seems to be overly strongly formulated considering the topic of the paper. The requirement 5 is too weakly formulated: isn't one central purpose of Linked Data to support linking specifically at object level and not only at document level? Also security should be addressed more explicitly in the requirement 2; the lack of it is one of the main obstacles for the adoption of these kinds of decentralized solutions in AEC.

- The role of ICDD in this work is difficult to understand, since the ICDD standard is not followed as specified and ICDD ontologies are not imported to or referenced by the LBDS ontology (nor in Listing 1). ICDD presents a very specific approach to exchange snapshots of interrelated files as zip-packages that can also contain cross-file linksets where links refer to internal identifiers or structures within those files. How does the ICDD approach help to achieve the goals of LBDServer which appear to be dynamic information management and sharing among stakeholders self-hosting the data they produce? ICDD is an intermediate and unambitious, more static and closed, and less defererencable/queriable version of linked data that could at best serve as a way to importing/exporting data between traditional sources and proper linked data systems. Moreover, it is unclear why LBDserver is presented as an implementation of ICDD, since it appears that only some sub-document linking patters of ICDD are used. The role of ICDD in the description of 4.3 The Reference Registry is more confusing than helpful. I would suggest to remove all or most discussion about ICDD and only include it as a citation. The references to DCAT2 - and naturally also LDP - in the LBDS ontology clearly make much more sense. It would also be interesting and relevant to relate the work to ISO 19650 terminology about Information Models and Information Containers.

- Fig 2. The prefixes ifc:, sosa:, omg: and geo: (at least) are not included in the Listing 1. The names of ifcOWL classes are not correct: for instance, it should be ifc:IfcBeam instead of ifc:Beam (unfortunately).
- Listing 2 and 3: Again, check that the prefix declarations are included in Listing 1.

- It is recommended to draw all ontology/instance diagrams in Fig 3 - Fig 10 in a uniform graphical notation. Either the notation used in Fig 5 - Fig 10 or alternatively in the Chowlk notation.

- The concept of a partial model within AEC should be better explained, including an overview of architectural, structural and MEP models such as HVAC model, together with evolution of models in the design stage.

- Fig 6. Based on the description above, shouldn't the virtual containment relation lbd:aggregates be lbd:contains?

- Check the use of the fixed-width font. E.g., how is formatted.

- Could the contents of Listing 4 - Listing 8 be illustrated with a diagram?

- Reference registries seem to contain elements related to each other only with the owl:sameAs relation. Since different partial models contain different entities (even the wall entities are not same in architectural and structural models, let alone any other entities), lots of local aliases for the entities of other models need to be created. How are these new local aliases connected to the other entities in the local model? I wonder if allowing the links to directly refer to external entities and supporting other link relations besides owl:sameAs would enable more semantically expressive connections between partial models.

- The three-level identifier scheme (concept - reference - identifier) is really complex. It is difficult to figure out the need for it - it may of course depend on whether identifiers are regenerated at the conversion or importing time. The examples in 4.3.3 are unconvincing. How do different parties come into conclusion that some of their concepts are same, given also the volume of concepts? What if they are not quite the same, but one is spatially or temporarily included in another, overlaps with another, a part of another, and so on?

- The example in Listing 14 seems to be syntactically incorrect, with dangling line starting with schema:value.

- Examples of SPARQL queries (hopefully federated) or other similar ways to use the data across multiple models should be presented as suggested in the end of Section 5. Also, the whatever specific setup actions (authentication to different services, etc.) are needed before the queries are run should be indicated.

- It is a somewhat trivial concern, but I would find the turtle fragments more readable if the naming conventions would more clearly indicate if the name refers to an individual or a class. Especially, I would prefer that individuals are not named as "concept..." but as "individual...", even though I understand that there are thing called individual concepts

- It would make sense to reference the International Data Spaces effort and especially the concept of data sovereignty.

- The associated ontologies and their documentation are fine

Review #2
Anonymous submitted on 24/May/2022
Suggestion:
Major Revision
Review Comment:

In this paper the authors propose a software architecture based on an implementation of the Linked Data Platform and the Solid echo-system for data sharing. The implementation allows accessing federated Linked Building Data. The paper’s outcome is a proof of concept of the architecture, built upon the Open BIM standard. The idea is to build a "federated data ecosystem” (common data environments) for buildings data using standards.

According to the authors the architecture has the following requirements:
The CDE shall be federated - a stakeholder data vault can be initiated at any server with access to the internet.
Authentication and authorization shall take place in a decentral manner.
The focus of the ecosystem shall be entirely on metadata structures - as it should not make any assumptions on the data models (e.g., ontologies) that structure project information.
The ecosystem shall support multiple database environments (e.g., triple stores, timeseries databases, SQL stores, key-value stores, etc.) to serve heterogeneous datasets in a manner that fits the datatype.
The ecosystem shall support sub-document linking and alignment of heterogeneous resources to other data on the Web.

Introduction Section
In this Section the authors introduce the problem, however this is a very long section that introduces all concepts used in the article very in detail. For instance, the authors write in detail the AEC industry and how that industry gets to open data to deal with buildings data interoperability (level 3). This should be at the very least one page less. It is complete, but since it is that long it is very hard to follow.

Regarding the research objectives and requirements, I only consider research points 3 and 5. The novelty of the current article relies on the building’s data ecosystem, however point 1 is a common federated data environment, 2 is again something common and I do not understand why this requirement in this work. 3, is the usual data integration application within our community, however to be able to deal with large building models is a plus, specially when these models may be edited concurrently. This work is putting all together the technology developed over the years by the Semantic Web community in a BIM use case.

I may be wrong of course, but I think this paper needs more focus on explaining the motivations for the requirements rather than focusing on explaining the AEC industry.

Also, before the research objectives, I would like to see a research question and hypotheses. It it weird to have goals without a clear point to validate using these goals.

Related work section
Again, a similar situation happens in this section. It is very lengthy and hard to read. The authors describe in detail the vocabularies they use, the datasets and the technologies. Figure 2 occupies almost 1 page and it is not an original figure (it is a referenced figure). Also, the authors add as a Listing the source code for common prefixes (such as RDF and RDF Schema), with a large font. This is totally useless in our context, the prefixes could be in an Appendix. Furthermore, the listing font size does not fit with the font size in other listings in the paper (this one is too large while other are too small for me).

Regarding the related platforms implementations, most of my thoughts about the project are summarized in the sentence about the project SCOPE "does use micro-services powered by Linked Building Data”. I think this project is a Microservices architecture powered by common vocabularies and a distributed authentication mechanisms. What are the differences of the current approach with “traditional means for the implementation”? A micro services architecture supports several programming languages, authentication, deal with data inconsistencies, etc. Agreed that micro services architectures do not deal with metadata and vocabularies by default.

In summary, 11 pages out of 27 (article without references) before starting the architecture’s description is too much.

Discovery and Aggregation of Federated Construction Projects Section

In this section the authors describe how they organize the data in each node implementing the LDP specification and how they communicate these data to the other nodes in the network. This section is concise and relatively easy to follow. Regarding the contents, i.e. how to access the data from each node I find it a bit hard to follow. I would suggest the authors a running example. While Fig 5 is useful and it has an explanation, something related during the paper would benefit the reader.
Something to improve is that from page 13 and onwards, listings have a too small font size.

Something that is hard to understand for me is the fact that the authors mix the concepts from LDP with an actual implementation during the same section. Furtherrmoore, looking at the actual implementation of the LDP, it reminds me a micro services architecture adding vocabularies (LDP).

Internal Organisation of a Partial Project Section

This section presents how the data is distributed across the ecosystem. If I understood correctly, the section presents the implementation of the LDP, the ecosystem’s architecture.
The section also presents the registry component that allows accessing these data.

Regarding the registry, it is the usual component that has been developed in several other projects needing to access distributed resources. Regarding the satellites, these are nodes with data, like in a P2P configuration, however each one serves their private data. The novelty is that the authors implement that on top of the LDP specification (2015) and the Solid framework for building data.
Furthermore, the authors describe the reference registry as a set of properties that allow to reference specific distributed data. This approach, while interesting, it does not fit with the overall message of the paper, which is "we are providing a distributed architecture for accessing federated data”. I do not see any word about how the software access the data. However this is not bad per se (it is bad because is somehow unrelated). I think this approach about how describing the data is more interesting than the SW approach.

From my point of view this article is not pointing in the right direction. Rather than focusing on the technology used, which is fairly common so far, I would be focusing on how the building data interacts in each organization. Something more guided from the LDP specification, which is barely described in the related work. And how that LDP implementation helps solving problems in the building community. 


Proof of concept

In this section the authors describe a use case in which some data provider generates data, which is referenced by the corresponding pod associated to that dataset. These data is annotated with the corresponding vocabulary, defined in previous sections. These new data is propagated.

Regarding the software engineering point of view there is nothing new, this is a very simple use case. It would be useful to have a use case in which there are conflicts in the data

Overall comments
This is a very long and hard to read paper. The authors mix two descriptions of this work, which 1) are the software architecture description and 2) how the system manages the metadata.

Regarding 1) It is hard to discern between the software architecture and the metadata management. I think this could be improved by giving more importance to the LDP section in the state of the art (which is only one paragraph), and guiding the user how it is implemented to solve the specific problem the authors try to solve.
Right now the paper looks like a usual micro services architecture, implemented it using the Linked Data Platform recommendation and the Solid ecosystem.
Regarding 2), this is a more interesting problem, which I have not seen much written in the article. I think that consistency between data produced between nodes and how is managed by the platform would be a better approach.

Review #3
Anonymous submitted on 04/Jul/2022
Suggestion:
Major Revision
Review Comment:

*Overview:
The paper presents The LBDserver, an extensible web framework for managing federated, heterogeneous (building) project data which combines access-controlled, heterogeneous project information and (open) contextual information into a federated knowledge graph. The paper builds on and extends the authors' previous papers.

*Relevance:
The problem discussed is relevant in the traditional and semantic web contexts,
as well as under the SW for Industrial Engineering special issue perspective.

*Understandability and Clarity:
The sections are not structured such that they contain all the necessary information systematically. There is no flow between the sections which makes the paper difficult to read. In some sections due to having too many subsections in each section, it becomes hard to follow the paper. The introduction is one of them as there are 5 subsections and they all seem unrelated to each other. The use case description comes too late as it was described in the 5th Section.

*Novelty and Soundness:
The applied methodologies are not entirely novel in the Semantic Web area but are novel applications in the Linked Building Data (LBD). Techniques are theoretically sound and grounded by using the well-known concept of the SOLID platform, ontologies, and vocabularies, as well as, applying them in the LBD domain.

*Related work:
To the best of my knowledge, the authors compare their proposal with the relevant related work, platforms and implementations, in particular, the DRUMBEAT platform, Project SCOPE and BIM Server models. However, it would be interesting to see the comparison of previous federated architecture and SOLID applications if there were any.

*Experimental evaluation:
The authors provide a proof of concept infrastructure section where they also provide the GitHub pages which provide the resources for the readers. One of the project links [1] was not working at the time of reviewing which makes it hard to implement the project so provided resources appear to be non-complete for replication of experiments. But 2 other links had sufficient description to enable reproducibility so when the 3rd link is fixed it could be reproducible.
For what concerns the evaluation of the platform, the experiment is missing in the paper. The major issue of the paper is that even though it was submitted as a full paper there are not any experiments provided for the platform. Full papers should have original research results provided by the authors but the work has good potential and can be improved with revision.

*Impact:
It seems that the proposed techniques are not restricted LBD domain only, but they could be applied to any domain in case one wants to apply the techniques in their own work, and therefore its scope is quite broad. However, due to the lack of evaluation impact is not entirely clear.

*Suggested improvements:
-Adding an evolution section ( This could be provided by maybe collecting feedback from the domain experts, providing a System Usability Scale (SUS) evaluation or comparing the proposed platform with other platforms discussed in the Related Work from different points of view (performance, scalability etc.)? Is the platform used by any domain experts?
-The paper builds on the previous papers thus it is suggested to clearly describe the contribution of the proposed paper.
-I would suggest explaining the platform based on an example use case which was introduced in an early section of the paper.
-In the abstract, it is claimed: “... validate the proposed framework for completeness using an existing project in Ghent, Belgium”. However, it is not clear how validation is performed. Rewriting the abstract in a way to include research findings based on evidence.

*Minor improvements:
BIM is not defined in the paper so include a definition.
Rewrite the following sentence: “Even in an Open BIM environment, however, IFC has a limited scope: it is a fixed-size set of classes and properties, oriented towards the description of a built asset”

[1] https://github.com/ConSolidProject/satellite-mongo