N-ary Relation Extraction for Simultaneous T-Box and A-Box Knowledge Base Augmentation

Tracking #: 1380-2592

Marco Fossati
Emilio Dorigatti1
Claudio Giuliano

Responsible editor: 
Philipp Cimiano

Submission type: 
Full Paper
The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for Intelligent Web-reading Agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel Knowledge Bases (KBs). Ultimately, comprehensive KBs, like Wikidata and DBpedia, play a fundamental role to cope with the issue of information overload. On account of such vision, this paper depicts the Fact Extractor, a complete Natural Language Processing (NLP) pipeline which reads an input textual corpus and produces machine-readable statements. Each statement is supplied with a confidence score and undergoes a disambiguation step via Entity Linking, thus allowing the assignment of KB-compliant URIs. The system implements four research contributions: it (1) executes N-ary relation extraction by applying the Frame Semantics linguistic theory, as opposed to binary techniques; it (2) simultaneously populates both the T-Box and the A-Box of the target KB; it (3) relies on a single NLP layer, namely part-of-speech tagging; it (4) enables a completely supervised yet reasonably priced machine learning environment through a crowdsourcing strategy. We assess our approach by setting the target KB to DBpedia and by considering a use case of 52,000 Italian Wikipedia soccer player articles. Out of those, we yield a dataset of more than 213,000 triples with an estimated 81.27% F1. We corroborate the evaluation via (i) a performance comparison with a baseline system, as well as (ii) an analysis of the T-Box and A-Box augmentation capabilities. The outcomes are incorporated into the Italian DBpedia chapter, can be queried through its SPARQL endpoint, and/or downloaded as standalone data dumps. The codebase is released as free software and is publicly available in the DBpedia Association repository.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Matthias Hartung submitted on 03/Jun/2016
Minor Revision
Review Comment:

My comments on the first version of this paper have been fully addressed (or sufficiently clarified, respectively).

Apart from some minor issues remaining in the current version (listed in the following), the paper is in principle ready to be accepted for publication:

* Section 1.1 (contributions): "grammatical analysis" should be left out, as this might also be understood to involve syntactic parsing
* p. 5: In (1c), "input labels" should be replaced by "class labels" or similar
* Section 4.2: I definitely acknowledge the effort the authors have put into clarifying their approach to LU selection. However, in its current state, the formalization seems to be flawed, as the defnition of the standard deviation holds for probability distributions rather than sets. From the discussion in the text I understand that lemmas exhibiting a high stddev are desired, as this is supposed to signal greater variety in the usage of a term. Why is this a desired property here -- this sort of variety might also be indicative of a high degree of polysemy?
* several times: "on the light of" --> "in light of"
* Section 6.3: In the enumeration of the features presented to the classifiers, features from entity linking seem to be missing (cf. p. 8: "EL results are part of the FE classifier feature set").
* Section 7: The regular expressions used to normalize numerical expressions might be left out.
* The recurrent example of Germany having been defeated by Denmark in 1992 might be transformed into a numbered example on p. 2, which could then be referred to in all subsequent mentions.
* Section 8: dangling subsection 8.1
* Algorithm 1: Include definitions of S, F and L in the algorithm as well (not inly in the text); C is completely undefined.
* Section 10.1, last paragraph: Leave out "On the other hand"; it's misleading.
* Figure 6: What does "LU" stand for here? Why are there no values for "Entit" and "Stato"?
* Section 10.3: How are confidence scores calculated from the rule-based baseline system?
* Section 10.4: The formula describing the constitution of the evaluation set are a bit hard to grasp.
* Section 11 as a whole is a bit lengthy and partly redundant with regard to previous sections. For instance, Table 9 might be discussed earlier in Section 6.1, the enumerations in 11.6 should be transformed into running text, subsection 11.9 should be merged into the conclusions or skipped entirely (and similarly so for 11.8).
* Section 11.5: I am a bit uncertain about what to conclude from the distribution of confidence scores. From Table 10 it seems that the classifiers generally tend to output higher confidence scores than EL (contrary to what is stated in thetext). Are these confidence scores reasonably correlated with the performances reported in Table 3? If this is not your primary focus, which other purposes do you have in mind for generating confidence scores?
* Section 12.1: I am still reluctant to consider RE and OIE "principal paradigms" of information extraction. From my perspective, the difference you are alluding to concerns relation extraction procedures that are either based on some pre-defined schema or schema-agnostic.
* From my perspective, the methods summarized in 12.4.1 and 12.4.2 are only marginally related to the approach presented in this paper. I would suggest to leave out both subsections.
* Reference [26] (on p. 3) refers to Gangemi & Presutti (2010). Is that correct?

Review #2
By Andrea Giovanni Nuzzolese submitted on 04/Jul/2016
Minor Revision
Review Comment:

The authors have done a great job by significantly improving the paper, and addressing my remarks from previous iteration very well.
However, few minor comments need clarification in order to make the paper acceptable for publication.

* in the introduction the authors should refer to [1] when introducing Intelligent Web-reading Agents;
* Figure 2 shows a diagram with a box named "Numerical Expression Normalization" (i.e., the box with ID 2(d)). However, this box is not described at all in the corresponding text immediately following the figure (i.e., point 2 of Section 3). This step is described some pages ahead in Section 7. I would suggest to add a brief description about Numerical Expression Normalization in Section 3 and to remand the reader to Section 7 for more details;
* in the formalisation about the Lexical Unit Selection provided at the end of Section 4.2 it seems that the scores B are not bound to specific verb lemmas (as it emerges, instead, from the text), but are rather computed over the whole set of tokens t representing verb lexicalisations. It means that only one final B score is computed for the whole corpus instead of as many B scores as the number of verb lemmas. Is this correct?
If it is not, the formalisation should be reworked in order to clarify the association of tokens t with specific verb lemmas;
* the two criteria listed in Section 5 do not completely justify the 5 LUs selected for the use case. In fact, only 1 (i.e., esordire) out of 5 LUs is picked from the worst ranked LUs. Why only one? Additionally, by looking at [2] there are possible others LUs that fit the criteria. For example,
"debuttare" is (i) among the worst ranked ones and (ii) within the use case domain. Nevertheless, "debuttare" was not
selected for the use case. Why? This part needs to be strengthen.
* Section 6.1 has been renamed to "Sentence Selection". However, the authors often use the word seed (as in the first version of the paper) without explaining what a seed is.

[1] Oren Etzioni, Michele Banko, and Michael J. Cafarella. Machine reading. In AAAI, pages 1517–1519, 2006.
[2] https://github.com/dbpedia/fact-extractor/blob/master/resources/stdevs-b...

Review #3
By Roman Klinger submitted on 26/Jul/2016
Minor Revision
Review Comment:

The authors addressed all my concerns from the review for the first version of the paper by adding additional paragraphs with the requested information or by removing irrelevant information to make the paper more concise.

I observe that the other reviewers added relevant aspects which I have not noticed in the previous version. I would like to re-emphasize the relevancy for an evaluation of the correctness of the predictions and the explanation how the crowdsourcing approach can be seen as a scalable, adaptable approach (reviewer 2 and 3 in the original review).

Altogether, this paper is (as the previous version) nicely written and easy to follow. For my general comments (which have been addressed by the authors by and large) I refer to the previous review. My main critism has been that this is a 'system paper' which describes one concrete implementation without investigating a specific hypothesis. Therefore, some design decisions needed to be explained in more detail. However, I propose that the explanations of design decisions follow a more objective criterion. One approach would be to do that in an ablation-study-like approach: What is the performance if one of the components is replaced by a naïve approach or omitted? (however I admit that, depending on the results, this might not be the best approach)

Along these lines: I would like to better understand the research question the authors answer in this paper.

However, I still believe that this is not a major change in this research but more a question of highlighting relevant aspects of this work more appropriately.

* Please review the capitalization of words. I do not think that Natural Language Processing and Entity Linking (for instance) are proper nouns.
* I propose to add the access date to URLs. Some of the URLs are actually cited like scientific articles. I am not sure about the journal style, but maybe these could go to the references (e.g. footnotes 1-6).
* Wikipedia links should point to specific versions.