Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods

Tracking #: 1167-2379

Heiko Paulheim

Responsible editor: 
Philipp Cimiano

Submission type: 
Survey Article
In the recent years, different web knowledge graphs, both free and commercial, have been created. While Google coined the term “Knowledge Graph” in 2012, there are also a few openly available knowledge graphs, with DBpedia, YAGO, and Freebase being among the most prominent ones. Those graphs are often constructed from semi-structured knowledge, such as Wikipedia, or harvested from the web with a combination of statistical and linguistic methods. The result are large-scale knowledge graphs that try to make a good trade-off between completeness and correctness. In order to further increase the utility of such knowledge graphs, various refinement methods have been proposed, which try to infer and add missing knowledge to the graph, or identify erroneous pieces of information. In this article, we provide a survey of such knowledge graph refinement approaches, with a dual look at both the methods being proposed as well as the evaluation methodologies used.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Natasha Noy submitted on 20/Oct/2015
Review Comment:

This manuscript was submitted as 'Survey Article' and should be reviewed along the following dimensions: (1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic. (2) How comprehensive and how balanced is the presentation and coverage. (3) Readability and clarity of the presentation. (4) Importance of the covered material to the broader Semantic Web community.

I very much appreciate the care and depth that the author has put into the revision. My comments were addressed extremely well and I don't have any further concerns. While the definition of a Knowledge Graph is still a bit vague, I agree that a more precise definition is hard to find at this point. I think the rest of the section that explains what a Knowledge Graph is not is exactly what's needed to complete the definition. The addition of the large commercial knowledge graphs and the human-based methods for evaluation complete the picture. The paper will be a very useful survey for others to read.

Perhaps another pass by a native speaker would make the paper even better, but it is not required.

Review #2
Anonymous submitted on 04/Nov/2015
Review Comment:

The authors have answered all the questions I posed in my previous review. In particular the paper presents now a more extended explanation of knowledge graphs and provides a high quality survey on knowledge graph refinement.

* The authors said that three out of the four mentioned papers are now contained in the survey (change ID 18) – the ISWC 2013 pattern based approach is omitted, which is fine by me if it doesn't fit into the narrowed context – but I cannot find the EKAW 2012 paper about schema axioms learning.
* There are still broken references in the bibliography: 33, 37, 55, 59

The paper is very well written and I recommend its publication.

Review #3
By Philipp Cimiano submitted on 23/Nov/2015
Review Comment:

This paper provides a survey of approaches to knowledge graph refinement and correction. After a general introduction to the field and an overview of existing knowledge graphs, the author separately discusses approaches to the completion and correction of knowledge graphs. In doing this, the authors distinguish the dimensions: i) completion vs. correction, ii) target of refinement (or correction), and iii) methods using the knowledge graph itself only or methods relying also on external data. Different evaluation regimes together with their pro and cons are also discussed. The review is certainly timely, well motivated and self-contained. It will make the topic accessible to people seeking guidance in getting familiar with knowledge graph refinement techniques.

As far as I can judge, all the comments from the reviews in the first round have been successfully addressed.

I have spotted a few errors that need to be corrected for the final version of the article to be published.

Section 2:

Likewise, we do not consider WordNet [66] as a knowledge graph, since it mainly concerned with common nouns and words

⇒ since „it“ is mainly concerned ???

Section 6:

a larger number of ontoloyg reasoners [19,20,61]. -> misspelling of „ontology“

Section 7, beginning

“From the survey in the last two sections, we can observe that there are quite a few works proposed for
knowledge graph refinement.”

What is meant here with „quite a few“? That there are many or indeed very few approaches? Please make this clearer as this statement is a bit vague.

Section 7.1

“A first interesting observation is that our distinguishing into completion and error detection is a strict one.”

⇒ odd, I suggest rephrasing as follows:

„A first interesting observation is that our distinction between completion and error detection is...“

Section 7.2

„The major knowledge graph used in the evaluations is DBpedia.“

⇒ "major" here unclear/ambiguous. I assume that here it does not refer to the size of DBpedia.

So I would rephrase as:

„DBpedia is the most frequently used knowledge graph for evaluation purposes“

As discussed in section 2, knowledge graphs differ heavily in their characteristic. -> „characteristics“