Automatic Detection of Relation Assertion Errors and Induction of Relation Constraints

Tracking #: 2181-3394

Authors: 
Andre Melo
Heiko Paulheim

Responsible editor: 
Claudia d'Amato

Submission type: 
Full Paper
Abstract: 
Although the link prediction problem, where missing relation assertions are predicted, has been widely researched, error detection did not receive as much attention. In this paper, we investigate the problem of error detection in relation assertions of knowledge graphs, and we propose an error detection method which relies on path and type features used by a classifier for every relation in the graph exploiting local feature selection. Furthermore, we propose an approach for automatically correcting detected errors originated from confusions between entities. Moreover, we present an approach that translates decision trees trained for relation assertion error detection into SHACL-SPARQL relation constraints. We perform an extensive evaluation on a variety of datasets comparing our error detection approach with state-of-the-art error detection and knowledge completion methods, backed by a manual evaluation on DBpedia and NELL. We evaluate our error correction approach results on DBpedia and NELL and show that the relation constraint induction approach benefits from the higher expressiveness of SHACL and can detect errors which could not be found by automatically learned OWL constraints.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 02/Jun/2019
Suggestion:
Minor Revision
Review Comment:

I thank the authors for addressing the reported issues in this new version of the paper. I think the paper is in good shape and it only requires very minor corrections. I have pointed out some of them below. I have also formulated a couple of questions.

- Have you considered a mppl that depends on the length? For example, paths of length 2 are more expressive and at the same time more numerous than paths of size 1. Would it make sense to define a relative or percentual mppl instead?

- The table above Section 5.2 appears out of nowhere. For which datasets was it computed? Was its purpose only to illustrate the weaknesses of the mean rank and the mean reciprocal rank?

Typos

- Introduction : ... knowledge graph Typically, in knowledge graph ... -> ... knowledge graph. Typically, in knowledge graph ...

- RQ3: asses -> assess

- Beginning of page 13: hiTo make ... -> To make ...

- Section 5.4: ... (errors of kind 1) Table 6 ... -> (errors of kind 1). Table 6

- Page 22: ... shown in Fig. 7, the expression E could defined as ... -> ... the expression E could be defined as ...

- Page 23: ... are only expressed is more complex SHACL constraints. -> are only expressed by more complex SHACL constraints.

Review #2
Anonymous submitted on 17/Jun/2019
Suggestion:
Minor Revision
Review Comment:

The authors has cleared the questions I had and corrected the notations. There are some other comments for which it would be nice to have some explanations.

Section 5.1.

The authors have talked about two types of errors which are introduced in the data set. Is the second dataset a subset of the first dataset? If yes how does it effect the results.

Page 14:

In the second table, it is not defined which version of PaTyBRED has been used. Since there are many versions introduced in the first table.

For the third table, why the results with DistMult are not compared?

In section 5.5, the evaluation has been performed on a very small sample, is it representative enough?