Linked Data Completeness: A Systematic Literature Review

Tracking #: 2478-3692

Authors: 
Subhi Issa
Onaopepo Adekunle
Fayçal Hamdi
Samira Si-said Cherfi
Michel Dumontier1
Amrapali Zaveri1

Responsible editor: 
Agnieszka Lawrynowicz

Submission type: 
Survey Article
Abstract: 
The quality of Linked Data is an important aspect to indicate its fitness for use in an application. Several quality dimensions are identified such as accuracy, completeness, timeliness, provenance, and accessibility, which are used to assess the quality. While many prior studies offer a landscape view of data quality dimensions, here we focus on presenting a systematic literature review for assessing the completeness of Linked Data. We gather existing approaches from the literature and analyze them qualitatively and quantitatively. In particular, we unify and formalize commonly used terminologies across 56 articles related to the completeness dimension of data quality and provide a comprehensive list of methodologies and metrics used to evaluate the different types of completeness. We identify seven types of completeness, including three types that were not previously identified in earlier surveys. We also analyze nine different tools capable of assessing Linked Data completeness. The aim of this Systematic Literature Review is to provide researchers and data curators a comprehensive and deeper understanding of existing works on completeness and its properties, thereby encouraging further experimentation and development of new approaches focused on completeness as a data quality dimension of Linked Data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Simon Razniewski submitted on 03/Jun/2020
Suggestion:
Minor Revision
Review Comment:

The following is a review of a revised version of the article. I appreciate the detailed author response and attempts to address all of my comments. At the same time still I notice a set of inconsistencies and other issues that in my opinion require more rigor, though they should be addressable with moderate effort.
Yet at the same time I also feel that the authors did not succeed well enough to systematize the problem of completeness assessment on finer-grained levels. Based on my earlier comment on this point (and similar remarks by reviewer 2) now for each dimension, 2 methods are discussed in more detail, but I'm still missing a systematic analysis of the respective problems, metrics, and solutions, in particular, what formal problems are, what classes of metrics there are for each dimension, and what the major paradigms underlying approaches are.
Addressing the latter issue would require a major revision, or may not be realistic at all within a moderate timeframe. As the survey in its current style already provides some value with its systematization of the literature on a coarser level, I thus recommend that, before acceptance, it undergoes at least a minor revision, although a major revision could make it much stronger.

========================
Issues that require attention in a minor revision:
2.r.43: The point about OWA and CWA is more clear now to me. Yet now it seems to be relevant only for a few dimensions. E.g., for schema completeness, it suffices to know a sample of relations that should be in the dataset, with no relation to OWA or CWA. Also, textual evidence [101, 102] and sample overlap [103] can be used for recall assessment without any need for structured gold-standard ground truth.
3.l.33: The currency completeness is still cryptic, and not helped by the example (as currently Einstein lives nowhere). The explanation in Section 3.25. also makes it look like an instance of metadata completeness.
3.l.4: This sentences skips over a very important point: The authors are not concerned with completeness w.r.t. reality, but w.r.t. requirements of a use case (the common "fitness for use" definition of data quality). Taking this stance is sensible, but should be spelled out explicitly.
3.l.13: Schema completeness: Definition seems incorrect to me - isn't schema completeness about the intensional, not the extensional knowledge (e.g., in description logics lingo, the completeness of the T-Box, while later dimensions are about completeness of the A-Box)? While the definition talks about relations being populated, i.e., instance data?
3.l.50: Example is a bit exotic, more straightforward are the cryptic Freebase and Wikidata identifiers
- Figure 1 would be helped by labels indicating the different kinds of completeness in it
- Figure 2 is still unclear - The second box contains a search, the third box contains another search? Was there a search done on the results of the first search? Also, why does the labelling with steps only start from the fourth box?
- 7.l.46: The paragraph is labelled "overview" but what follows is not an overview but delves directly into one specific method :(
- 12.l.35: Why are there two formulas? What is their relation/difference?
- 14.r.28: Meaning of "currency values" remains unclear
- 15.l.31: Why the negation in the definition?
- The authors promised to fix the references but apparently did not :(
- There are several venues called "CEUR-WS"
- A journal article published in issue 0, volume 0
- Broken characters ([55], [59])
- Typos (Janua)
- Incoherent level of detail: Sometimes the place of a conference, sometimes abbreviations of venues, sometimes publishers, sometimes a URL, but no apparent pattern.
-> My suggestion for readability would be to strip all conference and journal submissions down to "authors, venue, year"

[101] Mirza, Paramita, et al. "Enriching Knowledge Bases with Counting Quantifiers" ISWC 2018
[102] Ghosh, Shrestha, et al. "CounQER: A System for Discovering and Linking Count Information in Knowledge Bases" ESWC 2020
[103] Luggen, Michael, et al. "Non-Parametric Class Completeness Estimators for Collaborative Knowledge Graphs—The Case of Wikidata." ISWC 2019

========================
What would require a major revision:

The analysis of the problem space is still fairly superficial, and presumably considerably more effort would be needed to properly systematize the abstract problems investigated for each dimension, the families of metrics, and the paradigms underlying solutions. As it stands, the paper largely duplicates what the respective papers say, however, does not scrutinize their contents.

I point to a few examples of such issues below:
13.r.22: This tells nothing more than the paper title, and gives no idea how it relates to the papers discussed before.
15.r.6: "Problems" paragraph should mention crisp problem statements, like "Problem 1. Determine the metadata quality in open government datasets", not describe observations and approaches. And the same same for the metrics paragraph: "... propose metrics based on ..." - what are underlying similarities/differences of metrics?
8.r.43: "Problems" paragraph, but content says "the authors were interested in applying first order logic" - this is an approach, not a problem.

There are also some unclarities regarding the dimensions, where further scrutiny would be helpful:
- I already pointed out the issue with schema completeness above
- another issue concerns the dimensions property completeness and population completeness. In how far are these two separable? For example, formula (3) for population completeness refers to properties as well, and it seems indeed common that populations ("French cities") are defined via properties (instanceOf(City) and locatedIn(France)?
- Similarly, it is not clear whether property completeness is about having *a* value for each property for each entity, or having *all* values?

Review #2
Anonymous submitted on 17/Jun/2020
Suggestion:
Accept
Review Comment:

I appreciate changes made by the authors in response to the reviews. I believe that the paper is now ready for publication and it would serve well its purpose as a comprehensive introductory text.

Review #3
Anonymous submitted on 23/Jul/2020
Suggestion:
Reject
Review Comment:

I originally opted for a reject as I thought that there are issues with the survey execution that I consider too fundamental to get in revision rounds. That would give the opportunity to the authors to fix these fundamental issues and have better chances with their resubmission.

The editor chose for a major revision and I try to remain open minded in this revision round. However, even though some of my more detailed comments were addressed, my concerns about the survey's execution still stand and it is hard for me to proceed with revisions if the foundations are not appealing.

In my previous review, I mentioned that I am concerned about the delta with the previous survey on Linked Data quality and I think that this occurs to a certain extend because of the keywords scoping. The delta to the previous survey on quality remains too narrow. I acknowledge the authors’ argument that this paper aims to dive in more details with regard to completeness but I am afraid this is not achieved with the way the survey is executed.

Which brings me back to my comment on the keywords scope. I still stand by my argument that the keywords are not properly chosen and considering the answer that extended the keywords to e.g., knowledge graphs or knowledge bases or RDF datasets instead of only Linked Data, is considered for future does not help improving the current version of the work. Since 2012 Google introduced the term "knowledge graph" and since then the term "knowledge graph" became a trend. The previous paper on Linked Data quality was firstly submitted around 2013 which is still too early to massively see the term "knowledge graph" being used, so considering "Linked Data" as the only alternative is fine. However, the current article focuses on completeness and complements with contributions after 2016 which were not considered in the previous article. However, by then the term "knowledge graph" became rather prevalent to be ignored or be considered future work given it is already used for almost a decade. If it was proven in the answer that the extended list of keywords that I mentioned does not affect the paper, I would not insist. But at the moment, the current choice of keywords leaves important papers out of the papers collection. For instance,

* Nguyen et al. A Convolutional Neural Network-based Model for Knowledge Base Completion and Its Application to Search Personalization which was published in 2019 by SWJ was also not considered for the survey because of the keywords that were selected,
* Rashid et al. Completeness and consistency analysis for evolving knowledge bases which was published in 2019 by JWS was also not considered because of the keywords choice
* Gottschalk and Demidova. HapPenIng: Happen, Predict, Infer — Event Series Completion in a Knowledge Graph accepted at ISWC2019
* Kruit et al. Extracting Novel Facts from Tables for Knowledge Graph Completion accepted at ISWC2019
* Meilicke et al. Fine-grained Evaluation of Rule- and Embedding-based Systems for Knowledge Graph Completion

These papers are just examples of papers which were not considered by the journal article after a quick look to the prominent journals and conferences of the domain. However, this is enough evidence to prove that the keywords are not properly chosen, therefore the pool of papers is not complete and, thus, it might not be representative to proceed with generalised conclusions on the state of the art regarding Linked Data/Knowledge Graphs/RDF datasets completeness. Unfortunately, the authors consider this as future work whereas I see it as limitations of the execution of the current work.

Given the aforementioned, I still stand by my original review that this work should not be published in its current state. It feels contradicting but this is an article on completeness that actually lacks completeness.

There is no doubt that Linked Data/Knowledge Graph/RDF dataset completeness is an important research topic but a survey needs to reflect well the state of the art and this is not the case with this article.

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions, so I argue for each dimension separately to be sure that the article is not rejected for the wrong reasons:
(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
I do not think that a survey article that lacks completeness with respects to the addressed topic may be considered as a suitable introductory text.

(2) How comprehensive and how balanced is the presentation and coverage.
The coverage is questioned to a great extend as it is clear that there are many relevant contributions which are not considered due to the choice of keywords. I do not think that the survey is well executed if the keywords choice limits the results space by excluding papers that are without doubt relevant but the keywords that were used were not included.

(3) Readability and clarity of the presentation.
The readability and presentation of the paper is good, there is room for improvement of course, but this is not a reason to reject this article.

(4) Importance of the covered material to the broader Semantic Web community.
The covered material is important to the broader Semantic Web community. The problem is that the papers do not cover the complete state of the art which is a major issue for a survey article.