The proDataMarket Ontology: Enabling Semantic Interoperability of Real Property Data

Tracking #: 1567-2779

Shi Ling
Nikolay Nikolov
Dina Sukhobok1
Tatiana Tarasova
Dumitru Roman

Responsible editor: 
Werner Kuhn

Submission type: 
Ontology Description
Real property data (often referred to as real estate, realty, or immovable property data) represent a valuable asset that has the potential to enable innovative services when integrated with related contextual data (e.g., business data). Such services can range from providing evaluation of real estate to reporting on up-to-date information about state-owned properties. Real property data integration is a difficult task primarily due to the heterogeneity and complexity of the real property data, and the lack of generally agreed upon semantic descriptions of the concepts in this domain. In this paper we introduce the proDataMarket ontology – a key enabler for integration of real property data. The paper provides an overview of the proDataMarket ontology development process, including details on the requirements extracted from a set of relevant business cases, explanations of core concepts and relationships, and the realization of the ontology in RDFS/OWL. To date, the ontology has been used to integrate and publish more than 30 datasets as Linked Open Data from the business cases from which requirements were extracted.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 25/Jun/2017
Review Comment:


The paper describes a suite of modular ontologies for the real estate domain. I think the paper addresses an interesting and worthwhile domain that lacks (to my knowledge) good ontologies. However, the paper doesn't convey anything that convinces me that it actually achieves the stated goal of integrating real estate information from different legal systems. Instead a number of loosely associated business cases are presented, resulting in lots of ontological piecemeal without clear organization, thorough design rationalization, or verification and with little evidence that the stated competency questions can in fact be answered by the resulting ontology.
Because of the broad scope of the covered domains, the paper lacks a clear focus and contribution and convincing evidence of the quality of the designed ontology.

Going forward, I'd strongly suggest to focus on a core of property information (parcels, buildings, rights) and explain that in detail rather than trying to explain all the aspects of the different use cases and associated ontology modules. That would make for a paper that would fit well the scope of SWJ.


On page 3, the authors state ".. it is possible to classify the real property rights from different legal systems, which could enable cross-border transfer of real property information." It would be helpful to explain these legal differences in brief for the reader to understand the challenges involved in the ontology development.
I found the topic of integrating real estate data from different legal systems a compelling and well-scoped challenge (as motivated in Sections 1 and 2), but can't find much in the remainder that actually addresses this problem. More needs to be said about how the ontology overcomes these challenges. I think focusing on this core challenge would make the paper much more compelling.

Instead, Section 3 introduces a wide variety of business cases (such as the management of large real estate portfolios, natural hazard risk assessment, management of agricultural land, or the issue of notifications related to building permits.) For each of the business cases, competency questions are provided, with the majority of them being data queries. So far so good. But the competency questions are mostly about combining real estate data with other information (water data, land cover, crops, social indicators) rather than integrating real estate data from different representations. Section 3.2 (the development process) lacks sufficient detail for the reader to take much away from. The diversity of business cases and seemingly little overlap between them (or the overlap is not explained well enough) seem to carry the authors away to not just develop an ontology of real estate rights, but of all kinds of other spatially-grounded information (census data, land cover data, crops, protected sites, etc.). Developing a well-designed, thorough ontology that covers all of the business cases seems like a futile undertaking.
This is confirmed by the descriptions in Sec 4, which suggest that the resulting ontology lacks a solid ontological grounding and sufficiently detailed explanations to be appreciated and reused by others. Design choices need to be better rationalized and concepts, relations and properties clearly defined (e.g., what is the difference between LandParcel and CadastralParcel and AdministrativeUnit in Fig 5, how are parcels and subplots from Fig. 10 related to them?)

Additional remarks/questions for the author's consideration

1) the queries for the CAPAS seems to strongly suggest the use of GIS to integrate the data and would be extremely challenging to integrate on an ontological level. This sounds like a whole research project in itself.
- for the SIM business case, the lack of any data or quality data is mentioned as the major problem. I don't see how an ontology can address that problem.

2) Section 4: the table with classes and properties: are these numbers the newly added ones? Needs clarification. If not, how do modules with 0 or 1 (or even 2 or 3) classes or properties model anything at all? Generally, I find numbers of classes or properties not very meaningful statistics as they don't say much about the content or quality of an ontology.

3) Sec 4.1: I wonder whether the Observation and Measurement ontology could be reused for this part. I also find the "indicator" class problematic without a more clear description of what an indicator is and is not. Is it something that can be measured? How is "floodSusceptibility" (Sec 4.5) an indicator? I can think of "low elevation" as an indicator for flood susceptibility. Maybe "indicator" is just a poor choice of language?

4) Fig 5 uses the dul:part relation as many different relations. How can a building be part of a parcel the same way a parcel is part of a large parcel? That would suggest the building's removal changes the parcel. Better terms would be "builtOn" or "locatedIn".

Review #2
By Peter van Oosterom submitted on 15/Jul/2017
Major Revision
Review Comment:

This manuscript was submitted as 'Ontology Description' and should be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.

- word geospatial is pleonasm, rather use geographic (or spatial)
- p 1, Abstract: would be nice to highlight the benefits of described proDataMarket onology (e.g. describe what is now possible, and before not)
- p 1: 'Governments worldwide play an important role in the real estate', miss one of the important roles here: real estate taxation
- p 2: 'Real property data are often difficult and expensive to collect and access', also mention: to maintain/ update
- p 2: 'European Union Open Data Portal... the lack of a semantic descriptions remains a huge barrier', would not agree as for INPSIRE quite a complete UML model has been made
- p 2: 'There are a total of seven business cases', shortly name them, and explain why these 7 (why not less, more or others).
- p 2: 'did not meet the requirements of the business cases', indicate which were the most crucial limitations.
- p 3: when real right is explained, also mention difference between movables and in-movables.
- p 3: 'none of them is flexible enough to cover cross-domain requirements outside of the cadastral domain', ISO LADM makes bridges to other domains, such as utilities, buildings, addresses, taxation, CAP/LPIS, etc. So, which cross domain requirements are not covered
- p 4: section 3.1.1., again the 7 business cases feel very random (and not well motivated why these). Especially valuation or taxation would be expected here (see Çağdaş et al 2017)! But also related domains such as spatial planning/ zoning (also creating 'legal spaces') could be expected here.
- p 5: 'as five indicators at the building level', which 5?
- p 5: CAPAS, very surprising to see now Spain mentioned here (before it was all Norway), please motivate. Note that in integrated LADM/LPIS model has been created, Annex H of ISO 19152 (and also see Inan, et al 2010)
- p 5: why this very funny, specific mentioning of Sentinel-2 satellite data (there are many other types of data and/or areal photography)
- p 6: not clear what is status of NNAS, SIM, CCR, CST (and why relevant in this context), are these operational services?
- p 7: some sentences repeated too often; e.g. 'promote reuse outside of the context'
- p 7: 'primary users', somehow I do miss the true end-users
- p 8: rules are introduced, these are also very close to constraints for valid data, please mention this aspect
- p 9: fig 1., not clear what prodm-cad concepts add to the ladm concepts
- p 9, sec 3.2.3. whole section is at 'meta' level: only taking about RDFS/OWL while one would expect to see here some actual OWL...
- p 10, sec 4, ok, soem OWL was expected here (xml-fragments), butr now only some diagrams shown
- p 11, sec 4.1 seems realted to ISO 19165 Observations and Measurements, please elaborate (if own/ different solution was used)
- p 12, fig 5, and open arrow head was used for attribute gsp:SpatialObject (while on page 9 it was stated that properties would be indicated with solid arrow head). Note this is the case in several places/ figures
- p 13, sec 4.3, instead of the PropertyComplex, the standard LA_BAUnit could have been used here.
- p 14, sec 4.4, 'indicator types' are mentioned, but just illustrated with examples, one would expect that these important part of the ontology would be more explicit (similar to codelist values)
- p 14, fig 8, just referring gsp:SpatialObject is very weak typing, one would expect a more specific spatial type (e.g. area or polygon)
- p 14, sec 4.6, again ISO 19152 already has integrated LADM - CAP/LIPS model (annex H); see Inan et al 2010.
- p 16, fig 12: spatial object, again weak typing (as sentinel is most likely raster data)
- p 16, sec 4.6.4 LiDAR: as in point cloud data produced by laser scanning? Please explain, quite a surprise to see it here (not mentioned before in paper)
- p 16/17: the type of urban infrastructure seems very limited (sewer, water), while many others exist (telecom, electricity, gas, ...). Why this limitation? Note that LADM also covers utility networks.
- p 17, sec. 5: please give some statistics (type of data, origin, amount of data, etc.)
- p 17 'not open to public due to data licensing, the endpoint is not publicly exposed', very disappointing that given example is not reproducible by readers
- p 17/18, table 2: where is it specified in the SPARQL that we just want to retrieve state-owned properties?
- p 19, table 4: also in this second SPARQL example it is unclear were the read data is (endpoints).
- p 20, table 5: fact that records have a National Cadastral Reference, is indication that these used to be in cadastre (but are now perhaps outdated as the SoE is from the past)
- p 20, sec 6 'publish more than 30 datasets as Linked Open Data'... Hmm, only example given in paper was not open (cadastral data), not sure what you mean.
- p 20, sec 6 'other countries', what do you expect? LADM is international standard, and should cover most countries...

Some missing references, links (some mentioned above):

Cadastre and Land Administration Thesaurus (CaLAThe): (ISO 19152 based)

Halil Ibrahim Inan, Valentina Sagris, Wim Devos, Pavel Milenov, Peter van Oosterom, Jaap Zevenbergen, Data model for the collaboration between land administration systems and agricultural land parcel identification systems, In: Journal of Environmental Management, 91(12), pp. 2440-2454, 2010.

Jesper M. Paasch, Peter van Oosterom, Christiaan Lemmen, Jenny Paulsson, Further modelling of LADM's rights, restrictions and responsibilities (RRRs), In: Land Use Policy, 49, pp. 680-689, 2015.

Christiaan Lemmen, Peter van Oosterom, Rohan Bennett, The Land Administration Domain Model, In: Land Use Policy, 49, pp. 535-545, 2015.

Volkan Çağdaş, Abdullah Kara, Ümit Işıkdağ, Peter van Oosterom, Christiaan Lemmen, Erik Stubkjær, A Knowledge Organization System for the Development of an ISO 19152:2012 LADM Valuation Module, In: Proceedings of the FIG Working Week 2017 (Pekka Halme, ed.), Helsinki, pp. 19, 2017.