Interactive Learning: an Approach for Building DL Ontologies from Natural Language and Reasoning

Tracking #: 745-1955

Authors: 
Ryan Ribeiro de Azevedo

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
In this paper, we present an approach based on Reasoning and Natural Language Processing for Ontology Learning, specifically over Description Logic (DL) knowledge bases constituted by a TBox with ALC expressivity, from interactions with users in controled natural language text. The viability of our approach is demonstrated through the generation of descriptions of complex axioms from concepts defined by users. We evaluated our approach in an experiment with entry interactions enriched with hierarchy axioms, disjunction, conjunction, negation, as well as existential and universal quantification to impose restriction of properties. The obtained results prove that our model is an effective solution for reasoning, knowledge representation and automatic construction of expressive Ontologies. Thereby, it assists professionals involved in processes for obtain, construct and model knowledge domain.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 02/Sep/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

-1 weak reject

Reviewer's confidence
Select your choice from the options below and write its number below.

3 (medium)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

4 good

Novelty
Select your choice from the options below and write its number below.

3 fair

Technical quality
Select your choice from the options below and write its number below.

3 fair

Evaluation
Select your choice from the options below and write its number below.

2 poor

Clarity and presentation
Select your choice from the options below and write its number below.

2 poor

Review
Please provide your textual review here.

This paper provides an approach to Ontology Learning from natural text. Leveraging both syntactic and semantic techniques from NLP, the authors show a system that is able to automatically construct and validate ontologies by either translating stand-alone sentences or engaging in a conversation of sorts with the user. The latter process iteratively builds the ontology and detects inconsistencies as they appear.

While the topic and approach are interesting, and the authors seem to have obtained interesting results, the paper seems rushed.

The running example greatly helps explaining the various components, however I would like to read more about the actual process and reasoning behind the various translation steps. For example, how were the patterns in Tables 1 and 2 constructed or what were they derived from, and how does the detection of subsumption relations take place? I would advice removing or condensing some of the examples in Section 3.3, and extending the approach and evaluation a bit.

The evaluation is a bit questionable as the authors do not explain how they determined whether the translator processed a sentence correctly. Was a set of 'correct axioms' determined beforehand, or was the result of the translation evaluaded afterwards?

Remarks:
- Section 2.2 contains formatting instructions that probably shouldn't be there
- Table 2: Verbo -> Verb
- Figures 3 and 6 through 9 are missing, while 8 is referred to in the text
- 3.2, Processed Sentence 4: "marine_mammal ⊑ mam[m]al" is listed twice (one time with one m)
- The paper contains various awkwardly constructed (long) sentences

Review #2
Anonymous submitted on 03/Sep/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== -2 reject

Reviewer's confidence
Select your choice from the options below and write its number below.

== 3 (medium)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 4

Novelty
Select your choice from the options below and write its number below.

== 3 fair

Technical quality
Select your choice from the options below and write its number below.

== 2 poor

Evaluation
Select your choice from the options below and write its number below.

== 2 poor

Clarity and presentation
Select your choice from the options below and write its number below.

== 2 poorReview

Please provide your textual review here.
This paper presents an approach that can automatically create DL axioms from natural sentences, based on a parse tree provided by the Stanford parser. It then engages in a dialogue with a user to augment the knowledge base and fix potential inconsistencies.

There are a couple of things that make it hard for me to accept the paper in its current form.

First of all, the authors argue that this system is necessary since it improves on existing ontology learning systems (that produce only lightweight ontologies), or does not require the user to learn a controlled language (which is the case for ACE). As for the first argument, the paper does not convincingly show that the system *does* produce rich axioms for arbitrary sentences. The examples are taken (hand picked?) from Wikipedia or entered by a user. They are all very regular and come across as 'controlled'. Also the glossary for "juvenile" on my Wikipedia is "A juvenile is an individual organism that has not yet reached its adult form, sexual maturity or size. "... which is a considerably more complex sentence than the one used by the authors. As for the second argument, it would behoove the authors to show in a study that their system indeed improves over controlled natural language input using e.g. ACE, according to actual users.

Secondly, the technical description of the system is not precise enough for the reader to understand what it does. The interrogation/dialogue in the results section is apparently generated through an implementation that follows the pseudocode in algorithm 1. This algorithm is not explained anywhere. Since this incrementally checking and adding of axioms is one of the advertised strong points of the system, I expected more information here.

Third, as also mentioned under the first point, the evaluation of the system's performance seems to be a bit tautological. If the sentences were picked to be representable in ALC, there was some selection mechanism in place. Also, I do not follow the statistical relevance argument. It is unclear why these hypotheses were chosen, and it is unclear where the 67% come from.

Finally, the language is sometimes hard to follow. There are some cases of Portuguese fragments ('e' instead of 'and'). And there is a half-paragraph apparently copied from an instructions to authors document. Figures are hard to read, or not readable at all.

I think that the system, if described properly, could be very interesting, but the paper does not allow me to properly assess its performance compared to other systems.

Review #3
Anonymous submitted on 04/Sep/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

0

Reviewer's confidence
Select your choice from the options below and write its number below.

4

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

3

Novelty
Select your choice from the options below and write its number below.

3

Technical quality
Select your choice from the options below and write its number below.

2

Evaluation
Select your choice from the options below and write its number below.

3

Clarity and presentation
Select your choice from the options below and write its number below.

3

Review
This paper deals with the problem of ontology engineering from text with an original perspective: how can we build rich, formal and validated ontology with a very low cost for the domain expert or for the ontologist. The authors claim 2 original contributions: firstly the fact that they extract rich and complex axioms from natural language sentence, likes formal definitions of concepts using inclusion, union and negation operators, and secondly the use of reasoning (mainly subsumption) to infer new axioms from extracted axioms.
The input of the ontology extraction process are natural language sentences given by domain experts. The result is an ontology in OWL language that contains a rich set of axioms, object and data properties either directly extracted or inferred. Another important hypothesis behind this work is that this extraction process could be included in a system that would interact with a domain expert. The system would extract, formalize and validate knowledge from human input during a dialogue with the user.

The extraction process is called a translation process by the author as long as they try to transform all the linguistic information in the input sentences into concepts, hierarchical relations and relations. The process is carried out in 4 tasks: (1) syntactic parsing (using Stanford parser) to prepare the extraction, (2) semantic parsing (term and relation extraction), (3) OWL axiom representation and (4) reasoning to get new axioms. The whole is described step by step in the paper, and a significant part of the paper is dedicated to illustrating how the system would interact and inform the user of possible inconsistencies in the knowledge that he gave.

The whole process relies on several strong hypotheses that is almost kept silent by the authors :
- the input sentences must be quite simple and have the form of definitions.
- all the information available in these sentences is relevant and of ontological nature. For instance, if the input text were a novel, the system would identify sentiments, feelings and have much more difficulties in formalizing them. It would also extract a lot of useless information.
-> so one of the changes to be brought to the paper is to explain better the nature of the input sentences, the form that they should have to provide processable data.
- nouns and noun phrases are more likely to be concept labels whereas verbs are likely to become relation labels. Here again, this hypothesis leads to quite good results if sentences are definitions, but it fails on many other types of sentences.

The proposed approach is quite interesting in many aspects, and this is why it would be nice to accept the paper after it has been significantly improved.
Another section that would deserve significant changes is the presentation of the patterns, that is quite approximative and lacks precision.
A final feature to be improved is the presentation of patterns, which is not very precise.

To sum up, the paper raises interesting issues but the process that you have implemented suffers could be presented in a more convincing way, and some options should be more precisely justified. As it is, the process sounds naive because it assumes that input sentences are very simple. If the domain experts are asked to produce such basic sentences then the process is relevant. But you should explain this better. Then some references (like the work done in the NEON project) are missing about other approaches that start from definitions and simple sentences to extract ontological knowledge.

change suggestions and detailed comments :

Introduction
One of the subfields of Ontology : strange formulation, ontology engineering is not a subfield of Ontology

Section 1
page 2
The lack of combining the knowledge of specialists of an arbitrary domain ... :
I do not agree with this statement. Several papers in the literature report collaborative work involving domain experts and ontologist + NLP from text. Cf in the Applied Ontology Journal, papers about bio ontologies etc. Moreover, even if these methods and tools don't always meet their claims, Neon and collaborative Protégé are supposed to support the collaborative design of ontologies , involving experts and the reuse of existing ontologies. see papers by N. Fridman-Noy and T. Tudorache

tools used in Ontology Learning are only capa-ble of creating informal or unexpressive ontologies.
This statement is little too strong. May be you should focus on the kind of knowledge that you extract and that makes the ontology more expressive.

utilization -> use

end of section 1 : your approach could be compared with the PhD. thesis of Elena Montiel Ponsada, who proposed a very similar approach in the frame of the NEON toolkit and methodology.

Section 2
Before you give details about the architecture and its components, it would be nice to dedicate a section to input sentences provided by experts: how do you get them? how "natural" are they? do they come from existing documents or are the experts asked questions or are they just asked to express their knowledge the way they want? It is important to explain the kind of input that can be processed by the translation chain in order to evaluate its generality or to identify how it can be reused in another domain.

section 2.1
about the example sentence:
This sentence is a definition. Do you ask the expert to provide definitions as input? which guidelines is he given? for instance, how does he know that he gave all the required definitions?

their lexical categories -> it syntactic category (each word is supposed to have one, and NN, NP etc are not lexical one.)

instead of using RED for the syntactic categories, please use bold or italic as visual contrast in case the paper is printed in black and white.

vehcle -> vehicle

the result of the syntactic analysis of the sentence contains many open parentheses without closing ones.

Section 2.2 page 4
It is surprising to read vehicle /NN 3 times. Maybe you should talk about term occurrences ... The actual terms are not single words but the phrases "motor vehicle", "road vehicle" and "self-propelled vehicle".-> you may just add () to group motor and vehicle, etc. If single words are considered here, then you should explain it. > say 'single word terms' or single words that could become parts of a more complex term.

The page must be ... : what is this ??? remaining of the guidelines to the authors ??

2 e 3 -> 2 and 3 ??

motor_vehicle e road_vehicle -> motor_vehicle and road_vehicle

Relations are not directly represented in OWL. Then which format do you use to store these relations? do you tag the document?

Table 1
GIving the list of pattern is interesting but you should explain why this list is the good one, and compare it with existing patterns in the literature, for instance in various papers of this book : Probing Semantic Relations - Exploration and identification in specialized texts. Alain Auger, Caroline Barrière (Eds.), John Benjamins Publishing Company,
in Elena MOntiel's work, in Aussenac-Gilles and Jacques paper etc.

are a| is a| ... : why do you need to list all the possible exact forms whereas you use a parser that lemmatizes the verbs and nouns ?

Table 2 : is Verbo different from Verb?

page 6
composes -> compose

step 1 : It is not clear that you obtain these inclusions by subsumption from the other inclusions. From motor-vehicle IS-A vehicle and self-propelled-V ISA motor-V then you can deduce that self-propelled-V ISA vehicle. but what you wrote seems unreachable. If it is not an error, please explain how you deduce this relation from the previous ones.

page 7, final result
it is not clear whether the final result is (only) made of complex axioms, or if the more primitive formulas resulting from steps 2 and 3 are also included. It could be relevant to include them too. Explain more precisely.

The intersection between the 2 subclasses is odd. It seems to me that it has no meaning: both classes are vehicles but their intersection my be empty. The important information is that both are vehicles.

Section 2.4
giver -> given
aquaic -> aquatic

subsunction -> subsumption

page 10
how can the system guess this deduction and NOT the reverse one (herbivore ISA cow) ??? because of the grass ISA PLANT relation?

Section 3.4
... in a way : The end of the sentence is missing

page 12
shows -> show

section 4
you could also mention Elena's Montiel work
see http://oa.upm.es/5192/1/AELFEProceedingsMontiel.pdf
or
D2. 5.1: A library of ontology design patterns: reusable solutions for collaborative design of networked ontologies
V Presutti, A Gangemi, S David, GA de Cea, M Suárez-Figueroa, ...
NeOn Deliverable 2