# Ripple Down Rules for Question Answering

### Tracking #: 1112-2324

Authors:
Dat Quoc Nguyen
Dai Quoc Nguyen
Son Bao Pham

Responsible editor:

Submission type:
Full Paper
Abstract:
Recent years have witnessed a new trend of building ontology-based question answering systems. These systems use semantic web information to produce more precise answers to users' queries. However, these systems are mostly designed for English. In this paper, we introduce an ontology-based question answering system named KbQAS which, to the best of our knowledge, is the first one made for Vietnamese. KbQAS employs our question analysis approach that systematically constructs a knowledge base of grammar rules to convert each input question into an intermediate representation element. KbQAS then takes the intermediate representation element with respect to a target ontology and applies concept-matching techniques to return an answer. On a wide range of Vietnamese questions, experimental results show that the performance of KbQAS is promising with accuracies of 84.1% and 82.4% for analyzing input questions and retrieving output answers, respectively. Furthermore, our question analysis approach can easily be applied to new domains and new languages, thus saving time and human effort.
Revised Version:
Tags:
Reviewed

Decision/Status:
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Konrad Höffner submitted on 07/Jul/2015
 Suggestion: Minor Revision Review Comment: The resubmission of "Ripple Down Rules for Question Answering" successfully addresses both of my major points of criticism: 1. The contribution of this work has been clarified and differentiated against previous work. 2. An analysis of question answering over DBpedia has been added to the evaluation, which proves the suitability of the approach on large scale knowledge bases. Minor issues still remain, which leads me to the rating of "minor revision": 3. In chapter 4.3.3, the rules are called the "knowledge base". The correct meaning of that term is a set of facts, however, such as DBpedia. 4. The time to create the rules is stated in the text as 3 hours but in the table as 75 + 13 hours. 5. There are still errors in grammar, semantic and typography, such as: - "Web.Subsequently" (add space) - "The user is forced" (better: "The user has to") - "The question is fired at node" (?)
Review #2
By Shizhu He submitted on 09/Jul/2015
 Suggestion: Accept Review Comment: This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. (1) Originality: The originality of this work is the knowledge acquisition approach to systematically acquiring rules for converting a natural language question into an intermediate representation, but not a QA system of Vietnamese. Considering the rules are very important in building a practical QA system, the idea of this paper is novel and interesting. (2) Significance of the results, and (3) quality of writing: Even the knowledge base size is small for a practical QA system, authors make elaborate statements about the rules language, management system and utilize. And the revised paper is also clearly written. Other issues: The ontology mapping step is very important for the Ontology-based QA. In is work, an off-the-peg relation similarity services (used in the AquaLog system) have been adopted, is it valid for Vietnamese? It will be better if the authors explained it more carefully.
Review #3
By Christina Unger submitted on 19/Aug/2015
 Suggestion: Minor Revision Review Comment: ------------------------- Consistent capitalization ------------------------- * "Semantic Web" or "semantic web" (but not both, and also not "Semantic web") * "section 2" --> "Section 2" (this occurs with all section numbers throughout the whole paper; the general rule is: "In this section", but "In Section 2") * Natural language question analysis --> natural language question analysis * Question analysis --> question analysis * Answer retrieval --> answer retrieval * Ontology mapping --> ontology mapping * Word segmentation --> word segmentation * Part-of-speech tagging --> part-of-speech tagging * Relation similarity service --> relation similarity service For all these above cases: Either you capitalize all words, saying "Question Analysis" etc., or you don't capitalize it at all. Since you already use non-capitalized versions in parts of the text, I would stick to those. -------------- Use of hyphens -------------- * You use a lot of hyphens, e.g. in words like "question-category", "question-phrases", and "query-tuple". This is fine (although not necessary, as "question category" would be equally good), but please make sure to be consistent, e.g. you write "question structure patterns" on page 16. ------------------------------------------------ Formatting of code and natural language examples ------------------------------------------------ * "Term_1,Relation,Term_2,Term_3" as well as the English subscripts on Vietnamese words seem to be encoded as math environments, e.g. $Term_1$. Since this leads to bad formatting, it would be better to always encode it as text, e.g. \emph{Term}$_1$ or $\text{\em Term}_1$. * With question structures and categories and the like, sometimes you put them in quotes and sometimes you don't. Please do this uniformly throughout the whole paper: always or never put them in quotes. (I, personally, think that italicizing them is enough and that additionally putting them in quotes just clutters the text.) * The question mark in query tuples is also formatted differently -- sometimes in bold, sometimes italicized, sometimes neither. Please stick to one version. * Sometimes, when giving the English translation of a Vietnamese example question, you forgot to put it in quotes (e.g. on page 4 and page 15). * Also, sometimes you italicize English questions and sometimes you don't -- please decide on one version and use it everywhere. * Sometimes your English questions begin with a lower-case letter, where they should begin with an upper-case one. * You end "List all ..." questions with a question mark, which is not correct. Rather use a period or no punctuation mark. * When providing URLs, do you use \url{...} of the hyperref package? * All rules (R1, R3, ...) and the pattern in 4.2 on page 9 would look nicer if formatted in a monospace font, e.g. using \texttt{} or a verbatim environment. ------ Others ------ * You use the exampe question "List all students studying in K50 computer science course, who have hometown in Hanoi", which appears throughout the paper. However, better English would be: "List all students enrolled in the K50 computer science course, whose hometown is Hanoi." * When giving an example question together with its analysis, you always say "With the question", "With another question" and the like, just followed by the question and the intermediate representation. This is not a complete sentence in any sense. Please write something like "For the question ... the following intermediate representation is constructed: ..." or "Here is an example of a question and its corresponding intermediate representation: ..." * hard-wire --> hard-wired (everywhere: "hard-wired manner", "hard-wired approach", etc.) * Footnotes 9 and 10: Please use the more permanent URL http://www.sc.cit-ec.uni-bielefeld.de/qald/. Although it's not possible to point to the specific documents this way, the URLs you give will not be stable. ------------------------ Typos and grammar errors ------------------------ Page 1 * from the advanced information retrieval technologies --> from advanced information retrieval technologies * Then user --> Then the user * Unlike the search engines, the QA systems --> Unlike search engines, QA systems * In addition, the QA systems --> In addition, QA systems * rather than in the keyword-based mechanism --> rather than as keywords * Specifically, the traditional restricted-domain QA systems make use of the relational databases --> Specifically, traditional restricted-domain QA systems make use of relational databases * advantages of semantic web --> advances of the semantic web * making the senses of the input questions --> interpret the input questions * which is to capture the semantic structure --> which captures the semantic structure Page 2 * The key innovation of KbQAS proposes a --> The key innovation of KbQAS is that it proposes a * in the QA systems --> in a QA system * approaches to the best of our knowledge --> approaches, to the best of our knowledge * in terms of time, effort, and error-prone --> in terms of time and effort, and it is error-prone * where the consistency between rules is maintained and the unintented interaction --> where consistency between rules is maintained and an unintented interaction * as follows: we provide the related work --> as follows. We provide related work * Web.Subsequently --> Web. Subsequently * focused on proposing a method using the WordNet --> focused on a method using WordNet * the traditional restricted-domain QA systems --> traditional restricted-domain QA systems * use the syntactic/semantic interpretation rules --> use syntactic/semantic interpretation rules * high level world concepts --> high-level world concepts * process the input question as in [51,33,55,18,15,25,6] --> process the input question, e.g. [51,33,55,18,15,25,6] * to handle the ambiguities --> to handle ambiguities Page 3 * to interpret and answer the user questions --> to interpret and answer user questions * The discussion on --> A discussion on * in the use of processing resources, including --> using resources including * sentence segment --> sentence segmentation * directs to open-domains by combining --> is an open-domain system, combining * Onto-triple --> onto-triple * based on the similarity --> based on their similarity * successor to QuestIO --> successor of QuestIO * handle the ambiguities --> handle ambiguities * the Pythia system [53] relies on the ontology-based grammars generated from the Lexicalized Tree Adjoining Grammar tree to process complex questions --> the Pythia system [53] relies on ontology-based grammars to process complex questions * Turning to the Vietnamese question answering --> Turning to Vietnamese question answering * two main modules of the --> two main modules: the * query while --> query, while * uses the limited context-free grammars --> uses limited context-free grammars * via CYK algorithm --> by means of the CYK algorithm * in database --> in the database * hard-wire approach --> hard-wired approach * into triple-like formats of (Subject,Verb,Object) --> into triple-like formats (Subject,Verb,Object) * This section is to describe the overview --> This section gives an overview * two components of the --> two components: the Page 4 * and the Answer retrieval. --> and the answer retrieval component. * for later process of answer retrieval --> for later answer retrieval (or: for the later process of answer retrieval) * two modules of Ontology mapping --> two modules: ontology mapping * and an Ontolgy --> and an ontology * The concepts "truong_school" etc. are not italicized, although all other examples of this sort are. * Answer retrieval component , the --> answer retrieval component, the Page 5 * Figure 8 shows the answer --> Figure 2 shows the answer * Natural language question analysis component is --> The natural language question analysis component is * KbQAS makes the use of --> KbQAS makes use of * trained for question domain --> trained on the question domain * phrases, and label --> phrases and label * semantic category like --> semantic category, like * special-words --> special words * special-domain --> special domain Page 6 * gives the information --> gives information * We use four grammar patterns to determine relation phrases as following: --> We use the following four grammar patterns to determine relation phrases: * Figure 3: In the caption you don't italicize "TokenVN", although you do italicize it everywhere else in the text. Page 7 * of two example questions --> of the two example questions * as following: --> as follows: * encounters itself common difficulties --> encounters common difficulties * for semantic analysis of input questions --> for the semantic analysis of input questions * in the section 4 --> in Section 4 * two modules of Ontology mapping --> two modules: ontology mapping * the string distance algorithm --> a string distance algorithm * In case of the ambiguity is still present --> In case the ambiguity is still present (or: In case of ambiguity) * component produce --> components produces * corresponding with --> corresponding to * "lop_course" and --> "lop_course", and Page 8 * using the similar manner of --> using a manner similar to * Single Classification Ripple Down Rules for Question Analysis --> Single Classification Ripple Down Rules for question analysis * This section is to introduce --> This section introduces * the consistency is maintained --> consistency is maintained Page 9 * Given the question case --> Given the question * involved in ATK project --> involved in the ATK project * in which projects is enrico motta working on --> which projects is enrico motta working on * with except edge --> as an except edge * with false edge --> as a false edge * a conclusion is always given --> a conclusion is always reached * at layer-4 --> at layer 4 * at the layer-5 --> at the layer 5 * using JAPE grammar --> using JAPE grammars Page 10 * in exception structure --> in an exception structure * Knowledge Acquisition Process --> Knowledge acquisition process * It is because the main focus of our approach is on the process of creating the rule-base system, so it is language independent. --> Our approach is language-independent, because the main focus is on the process of creating the rule-based system. * identify the noun phrases --> identify noun phrases * as outputs of --> as output of Page 11 * These questions are specified to the Knowledge Media Institute --> These questions concern the Knowledge Media Institute * who are the researchers in semantic web research area? --> Who are researchers in the semantic web research area? * Figure 8 is not well-placed, as it occurs directly between a question and its representation. Please place it at the top or bottom of the column. * of empty intermediate representation --> of an empty intermediate representation * in the same span to the --> in the same span as the * Assumed that --> Assume that * which universities are Knowledge Media Institute collaborating with? --> Which universities is the Knowledge Media Institute collaborating with? * Regarding to the input question --> Regarding the input question * which universities are Knowledge Media Institute collaborating with? --> Which universities is the Knowledge Media Institute collaborating with? Page 12 * the node (4) --> node (4) * we have a correct conclusion --> we get a correct conclusion * to solve the question-structure ambiguities --> to solve question-structure ambiguities * who are the partners involved in AKT project? --> Who are the partners involved in the AKT project? * which projects sponsored by eprsc are related to semantic web? --> Which projects sponsored by EPRSC are related to the semantic web? Page 13 * This question is fired --> This question fires * as following: --> as follows: (Both occur twice on this page.) Page 14 * It is because the question analysis component employs our knowledge acquisition approach which is language independent, while the answer retrieval component produces answers from a domain-specific Vietnamese ontology. --> This is not a complete English sentence. Please reformulate! * This section is to indicate --> This section indicates * Question Analysis for Vietnamese --> Question analysis for Vietnamese * various-structure questions generated by four volunteer students to build a Vietnamese knowledge base for question analysis. --> questions of various structures generated by four volunteer students. * Our first approach took --> With our first approach it took * out second approach took --> with our second approach it took * spent for looking at question --> spent looking at questions Page 15 * Regarding to a question-structure based evaluation --> Regarding a question-structure-based evaluation * Question Analysis for English --> Question analysis for English * research area on Semantic web --> research area on the semantic web * Table 6: Testing results --> Test results * results on analyzing --> results of analyzing Page 16 * is presented Table 5 --> is presented in Table 5 * different to AquaLog --> different from AquaLog * adapt to a new domain and a new language of our knowledge acquisition approach for question analysis. --> adapt to a new domain and a new language. * inside noun phrases --> in noun phrases * somehow can help --> can help * reduce the ambiguities --> reduce ambiguities * theirs synonyms --> their synonyms * To evaluate KbQAS by specifying in the Answer retrieval component --> To evaluate KbQAS' answer retrieval component * in the section 3.2 --> in Section 3.2 * correct answers to 61 questions over 74 questions --> correct answers for 61 out of 74 questions * corresponding with --> corresponding to * are because the target ontology construction lacked --> are due to the target ontology construction lacking * cannot be mapped or incorrectly mapped --> cannot be mapped or are incorrectly mapped * to 7 questions --> for 7 questions * in faculty of --> in the faculty of * that KbQAS failed to return answers --> for which KbQAS failed to return answers * difficultly --> difficult Page 17 * Vietnamese namely --> Vietnamese, namely * two components of --> two components: * to produce answer --> to produce an answer * allows systematic control --> allows for systematic control * annotating corpus --> annotating corpora * open domain --> open-domain * to turn an input question to an explicit representation --> to transform an input question into an explicit representation * on the Linked Open Data --> on Linked Open Data * List all the publications in knowledge media institute --> List all publications in the Knowledge Media Institute * is the question which belongs --> is a question which belongs * Phd --> PhD * to the "ThreeTerm" --> to "ThreeTerm" * 45 is the number of students studying in K50 computer science course, is not it? --> 45 is the number of students enrolled in the K50 computer science course, is it not? Page 18 * is the question which belongs --> is a question which belongs * which student has the highest grade point average in faculty of Information Technology? --> Which student has the highest grade point average in the faculty of Information Technology? * answers for the sub-questions --> answers of the sub-questions * which publications are in knowledge media institute related to compendium? --> This sounds wrong, but I'm not sure what compendium is, so how the question should be formulated correctly. (Maybe: which publications in the knowledge media institute are related to a compendium?) * some questions such as [...] contains --> some questions, such as [...], contains * study in --> study at * it will has --> it has (or: it will have) * who study in faculty of Information Technology? --> who studies at the faculty of Information Technology? * studying in K50 --> enrolled in K50 * is not it? --> is it not? * question is classified into one of the following classes of --> a question is classified as one of the following classes: * refers a cause --> refers to a cause * such string as --> such strings as (This occurs in every bullet point from here on.) * similar to Why-question or How is/are question --> similar to Why-questions or How is/are questions * question type in English --> questions in English (This also occurs in every bullet point from here on.)