Distributional methods for extracting common sense knowledge by ranking triples according to prototypicality

Tracking #: 1713-2925

Authors: 
Soufian Jebbara
Valerio Basile
Elena Cabrio
Philipp Cimiano

Responsible editor: 
Guest Editors ML4KBG 2016

Submission type: 
Full Paper
Abstract: 
In this paper we are concerned with developing information extraction models that support the extraction of common sense knowledge from unstructured datasets. Our motivation is to extract manipulation relevant-knowledge that can support robots’ action planning. We frame the task as a relation extraction task and, as proof-of-concept, validate our method on the task of extracting two types of relations: locative and instrumental relations. The locative relation relates objects to the prototypical places where the given object is found or stored. The second instrumental relation relates objects to their prototypical purpose of use. While we extract these relations from text, our goal is not to extract specific mentions, but rather, given an object as input, extract a ranked list of locations and uses ranked by ‘prototypicalyity’. We use distributional methods in embedding space, relying on the well-known skip-gram model to embed words into a low-dimensional distributional space, using cosine similarity to rank the various candidates. In addition to using embeddings computed using the skip-gram model, we also present experiments that rely on the so called NASARI vectors, which rely on disambiguated concepts to compute embeddings and are thus semantically aware. While this distributional approach has been published before, we extend our framework by additional methods relying on neural networks that learn a score to judge whether a given candidate pair actually expresses a desired relation. The network thus learns a scoring function using a supervised approach. While we use a ranking-based evaluation, the supervised model is trained using a binary classification task. The resulting score from the neural network and the cosine similarity in the case of the distributional approach are both used to compute a ranking. We compare the different approaches and parameterizations thereof on the task of extracting the above mentioned relations. We show that the distributional similarity approach performs very well on the task. The best performing parameterization achieves an NDCG of 0.913, a Precision@1 of 0.400 and a Precision@3 of 0.423. The performance of the supervised learning approach, in spite of having being trained on positive and negative examples on the relation in question, is not as good as expected and achieves an NCDG of 0.908, a Precision@1 of 0.454 and a Precision@3 of 0.387, respectively.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Dagmar Gromann submitted on 03/Oct/2017
Suggestion:
Minor Revision
Review Comment:

Thank you for your responses and comments and the interesting revised version of the paper. I appreciate the difficulty to include all previous approaches since this topic is broad and has been tackled from many different perspectives.

Some of my reservations persist:
- while I appreciate the fact that you answer my questions in the comments, those details (number of participants in the crowdsourcing, judgements, etc.) are not only interesting to me and should clearly be provided in the paper as well.
- changing the title alone does not resolve the claims in the paper that you extract from unstructured data which clearly you do not (this also applies to the argumentation about "explicit mentions in text" which is quite unclear in the introduction) - this is misleading and needs to be changed in a final version
- in spite of the claim in the paper, the methods seem to be not fully generalizable since they rely on location/use annotations in available knowledge bases; this reservation is partially attributable to the fact that only two types of relations were considered and thus judgements about generalizability are difficult

Title:
While the change in title now reflects the content better, it does, however, not acknowledge the third approach (the main contribution of this publication in comparison to previous publications) since it talks only about the two distributional models.

References:
I think something went wrong with the encoding of reference [34]
=> "journal = ACM Transactions on Speech and Language Processing, volume = 8, number = 3, pages = 4-6, year = 2011"

Minor comments in order of appearance starting with abstract:
"mentions" sounds a little strange => occurrences?
"In addition to using embeddings computed using the skip-gram model," => "In addition,"
"While we use a ranking-based evaluation, the supervised model is trained using a binary classification task." => "while" is used for contrasting and there is no contrast in this sentence
"The answers... involves" => involve (should this not be require?)
"does not perform as good" => "well"
"prototypicallity" => "prototypicality"
"on the one hand based on a crowdsourcing..." => this linker serves no function in this paragraph
"while there has been a lot of work" => "while" is used for contrasting and there is no contrast in this sentence
"Such techniques are related to techniques"
"prototypical triples are assigned a higher score than aprototypical triples" => this is not a binary classification but rather a graded function, isn't it?
"In the previous section, we motivated the use" => the "previous section" is not previous but the supersection (3) of this present subsection (3.1)
"a worth of" => a wealth of
"of our approaches to other kind of relations" => kinds
"have shown that both an approach" => ?
"anonymour" => anonymous

Review #2
By Ziqi Zhang submitted on 16/Oct/2017
Suggestion:
Accept
Review Comment:

The authors have addressed many issues raised and their explanations for the remaining problems are also reasonable. I think the paper is in an acceptable state though the authors should do a final proof read to correct some typos.

One main issue that was not addressed is the genericity of the proposed method, i.e., can it be applied to relations other than the two evaluated in this work. Although it is acceptable to leave it at the current state of the work due to the short time frame given for the revision, I think the authors should at least discuss this in future work. Be specific, give examples of other relations that can benefit from this work. As I don't think it is clear to every reader that what these relations could be.

Review #3
By Jedrzej Potoniec submitted on 17/Oct/2017
Suggestion:
Accept
Review Comment:

Below provided is a short summary along the main reviewing dimensions and the detailed remarks follow.
* originality: The paper is an extension of a conference paper from EKAW 2016, but there is enough new contribution for a journal paper.
* significance of the results: There results are interesting and offer an advance in the area of relation extraction.
* quality of writing: The manuscript is well written.

Overall, I am very happy to see most of my remarks addressed. The reviewed paper reads very well and I think it is ready to be published. I have only few minor comments that should be addressed in the camera ready, but I do not think another round of reviews would be necessary:
* Abstract and introduction speak about "binary classification", but Section 3.3 about regression.
* I am a bit surprised that the passage about using Dropout disappeared from the text. I guess it is an omission and it should be restored.
* The queries for Section 4.1 are presented only in the response letter, but I think they should be presented also in the paper.