N-ary Relation Extraction for Simultaneous T-Box and A-Box Knowledge Base Augmentation

Tracking #: 1463-2675

Authors: 
Marco Fossati
Emilio Dorigatti
Claudio Giuliano

Responsible editor: 
Philipp Cimiano

Submission type: 
Full Paper
Abstract: 
The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for intelligent Web-reading agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel knowledge bases (KBs). Ultimately, comprehensive KBs, like Wikidata and DBpedia, play a fundamental role to cope with the issue of information overload. On account of such vision, this paper depicts the Fact Extractor, a complete natural language processing (NLP) pipeline which reads an input textual corpus and produces machine-readable statements. Each statement is supplied with a confidence score and undergoes a disambiguation step via entity linking, thus allowing the assignment of KB-compliant URIs. The system implements four research contributions: it (1) executes n-ary relation extraction by applying the frame semantics linguistic theory, as opposed to binary techniques; it (2) simultaneously populates both the T-Box and the A-Box of the target KB; it (3) relies on a single NLP layer, namely part-of-speech tagging; it (4) enables a completely supervised yet reasonably priced machine learning environment through a crowdsourcing strategy. We assess our approach by setting the target KB to DBpedia and by considering a use case of 52,000 Italian Wikipedia soccer player articles. Out of those, we yield a dataset of more than 213,000 triples with an estimated 81.27% F1. We corroborate the evaluation via (i) a performance comparison with a baseline system, as well as (ii) an analysis of the T-Box and A-Box augmentation capabilities. The outcomes are incorporated into the Italian DBpedia chapter, can be queried through its SPARQL endpoint, and/or downloaded as standalone data dumps. The codebase is released as free software and is publicly available in the DBpedia association repository.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Matthias Hartung submitted on 12/Oct/2016
Suggestion:
Accept
Review Comment:

All my comments on previous versions of the manusript have been addressed.

Review #2
By Andrea Giovanni Nuzzolese submitted on 11/Nov/2016
Suggestion:
Accept
Review Comment:

The authors addressed all my comments carefully, hence I accept the paper in its latest form.

The reference [49] can be update to the following record:
V. Presutti, A. G. Nuzzolese, S. Consoli, A. Gangemi, and D. Reforgiato Recupero. From hyperlinks to Semantic Web properties using Open Knowledge Extraction. Semantic Web Journal 7(4): 351-378 (2016). DOI: 10.3233/SW-160221

Review #3
By Roman Klinger submitted on 12/Dec/2016
Suggestion:
Accept
Review Comment:

The authors addressed my comments I had for this paper. There are some minor aspects remaining which could improve the paper, mainly to support the reader in getting an understanding of design decisions and their impact on the performance on the overall system.

However, I agree with the authors that such analyses might be beyond the scope of this paper.