Review Comment:
Overall evaluation
Select your choice from the options below and write its number below.
0
Reviewer's confidence
Select your choice from the options below and write its number below.
4
Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
3
Novelty
Select your choice from the options below and write its number below.
3
Technical quality
Select your choice from the options below and write its number below.
2
Evaluation
Select your choice from the options below and write its number below.
3
Clarity and presentation
Select your choice from the options below and write its number below.
3
Review
This paper deals with the problem of ontology engineering from text with an original perspective: how can we build rich, formal and validated ontology with a very low cost for the domain expert or for the ontologist. The authors claim 2 original contributions: firstly the fact that they extract rich and complex axioms from natural language sentence, likes formal definitions of concepts using inclusion, union and negation operators, and secondly the use of reasoning (mainly subsumption) to infer new axioms from extracted axioms.
The input of the ontology extraction process are natural language sentences given by domain experts. The result is an ontology in OWL language that contains a rich set of axioms, object and data properties either directly extracted or inferred. Another important hypothesis behind this work is that this extraction process could be included in a system that would interact with a domain expert. The system would extract, formalize and validate knowledge from human input during a dialogue with the user.
The extraction process is called a translation process by the author as long as they try to transform all the linguistic information in the input sentences into concepts, hierarchical relations and relations. The process is carried out in 4 tasks: (1) syntactic parsing (using Stanford parser) to prepare the extraction, (2) semantic parsing (term and relation extraction), (3) OWL axiom representation and (4) reasoning to get new axioms. The whole is described step by step in the paper, and a significant part of the paper is dedicated to illustrating how the system would interact and inform the user of possible inconsistencies in the knowledge that he gave.
The whole process relies on several strong hypotheses that is almost kept silent by the authors :
- the input sentences must be quite simple and have the form of definitions.
- all the information available in these sentences is relevant and of ontological nature. For instance, if the input text were a novel, the system would identify sentiments, feelings and have much more difficulties in formalizing them. It would also extract a lot of useless information.
-> so one of the changes to be brought to the paper is to explain better the nature of the input sentences, the form that they should have to provide processable data.
- nouns and noun phrases are more likely to be concept labels whereas verbs are likely to become relation labels. Here again, this hypothesis leads to quite good results if sentences are definitions, but it fails on many other types of sentences.
The proposed approach is quite interesting in many aspects, and this is why it would be nice to accept the paper after it has been significantly improved.
Another section that would deserve significant changes is the presentation of the patterns, that is quite approximative and lacks precision.
A final feature to be improved is the presentation of patterns, which is not very precise.
To sum up, the paper raises interesting issues but the process that you have implemented suffers could be presented in a more convincing way, and some options should be more precisely justified. As it is, the process sounds naive because it assumes that input sentences are very simple. If the domain experts are asked to produce such basic sentences then the process is relevant. But you should explain this better. Then some references (like the work done in the NEON project) are missing about other approaches that start from definitions and simple sentences to extract ontological knowledge.
change suggestions and detailed comments :
Introduction
One of the subfields of Ontology : strange formulation, ontology engineering is not a subfield of Ontology
Section 1
page 2
The lack of combining the knowledge of specialists of an arbitrary domain ... :
I do not agree with this statement. Several papers in the literature report collaborative work involving domain experts and ontologist + NLP from text. Cf in the Applied Ontology Journal, papers about bio ontologies etc. Moreover, even if these methods and tools don't always meet their claims, Neon and collaborative Protégé are supposed to support the collaborative design of ontologies , involving experts and the reuse of existing ontologies. see papers by N. Fridman-Noy and T. Tudorache
tools used in Ontology Learning are only capa-ble of creating informal or unexpressive ontologies.
This statement is little too strong. May be you should focus on the kind of knowledge that you extract and that makes the ontology more expressive.
utilization -> use
end of section 1 : your approach could be compared with the PhD. thesis of Elena Montiel Ponsada, who proposed a very similar approach in the frame of the NEON toolkit and methodology.
Section 2
Before you give details about the architecture and its components, it would be nice to dedicate a section to input sentences provided by experts: how do you get them? how "natural" are they? do they come from existing documents or are the experts asked questions or are they just asked to express their knowledge the way they want? It is important to explain the kind of input that can be processed by the translation chain in order to evaluate its generality or to identify how it can be reused in another domain.
section 2.1
about the example sentence:
This sentence is a definition. Do you ask the expert to provide definitions as input? which guidelines is he given? for instance, how does he know that he gave all the required definitions?
their lexical categories -> it syntactic category (each word is supposed to have one, and NN, NP etc are not lexical one.)
instead of using RED for the syntactic categories, please use bold or italic as visual contrast in case the paper is printed in black and white.
vehcle -> vehicle
the result of the syntactic analysis of the sentence contains many open parentheses without closing ones.
Section 2.2 page 4
It is surprising to read vehicle /NN 3 times. Maybe you should talk about term occurrences ... The actual terms are not single words but the phrases "motor vehicle", "road vehicle" and "self-propelled vehicle".-> you may just add () to group motor and vehicle, etc. If single words are considered here, then you should explain it. > say 'single word terms' or single words that could become parts of a more complex term.
The page must be ... : what is this ??? remaining of the guidelines to the authors ??
2 e 3 -> 2 and 3 ??
motor_vehicle e road_vehicle -> motor_vehicle and road_vehicle
Relations are not directly represented in OWL. Then which format do you use to store these relations? do you tag the document?
Table 1
GIving the list of pattern is interesting but you should explain why this list is the good one, and compare it with existing patterns in the literature, for instance in various papers of this book : Probing Semantic Relations - Exploration and identification in specialized texts. Alain Auger, Caroline Barrière (Eds.), John Benjamins Publishing Company,
in Elena MOntiel's work, in Aussenac-Gilles and Jacques paper etc.
are a| is a| ... : why do you need to list all the possible exact forms whereas you use a parser that lemmatizes the verbs and nouns ?
Table 2 : is Verbo different from Verb?
page 6
composes -> compose
step 1 : It is not clear that you obtain these inclusions by subsumption from the other inclusions. From motor-vehicle IS-A vehicle and self-propelled-V ISA motor-V then you can deduce that self-propelled-V ISA vehicle. but what you wrote seems unreachable. If it is not an error, please explain how you deduce this relation from the previous ones.
page 7, final result
it is not clear whether the final result is (only) made of complex axioms, or if the more primitive formulas resulting from steps 2 and 3 are also included. It could be relevant to include them too. Explain more precisely.
The intersection between the 2 subclasses is odd. It seems to me that it has no meaning: both classes are vehicles but their intersection my be empty. The important information is that both are vehicles.
Section 2.4
giver -> given
aquaic -> aquatic
subsunction -> subsumption
page 10
how can the system guess this deduction and NOT the reverse one (herbivore ISA cow) ??? because of the grass ISA PLANT relation?
Section 3.4
... in a way : The end of the sentence is missing
page 12
shows -> show
section 4
you could also mention Elena's Montiel work
see http://oa.upm.es/5192/1/AELFEProceedingsMontiel.pdf
or
D2. 5.1: A library of ontology design patterns: reusable solutions for collaborative design of networked ontologies
V Presutti, A Gangemi, S David, GA de Cea, M Suárez-Figueroa, ...
NeOn Deliverable 2
|