Natural Language-based User Interface for Knowledge Management System. A Computational Linguistics Approach

Tracking #: 483-1679

Authors: 
Mario Monteleone
Maria Pia di Buono
Federica Marano

Responsible editor: 
Guest editors Semantic Web Interfaces

Submission type: 
Full Paper
Abstract: 
This research wants to show how it is possible to convert natural language (NL) queries into formal semantic ones, by means of a procedure which allows to semi-automatically map natural language to formal language. More specifically, focusing on voice and/or keyboard-based natural language user interfaces, this research wants to explain how to simplify and improve human-computer natural language interaction and communication. Also, in a more wide perspective, it wants to individuate a method for the creation of Natural Language Processing (NLP) applications finalized to the achievement of Question-Answering (QA). The NLP activities sketched in this research fall inside Lexicon-Grammar (LG) theoretical and practical framework, which is one of the most consistent methods for natural language formalization, automatic textual analysis and parsing. This framework is independent from those factors that are crucial within other approaches, as those concerning the interaction type (voice or keyboard-based), the length of sentences and propositions, the type of vocabulary used, and the restrictions due to users’ idiolects. Another feature is the possibility to process unstructured, semi-structured or structured information retrievable from either knowledge management system (KMS) or on-line repository, also considering that all other approaches mainly use interfaces which dialogue with structured data. This approach allows to overcome users' limits about domain ontology knowledge, and to define relationships between search terms to be considered. Keywords: Natural Language Interface, Knowledge Management System, Lexicon-Grammar, SeRQL, Cultural Heritage
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Aba-Sah Dadzie submitted on 18/Jun/2013
Suggestion:
Major Revision
Review Comment:

The paper presents an approach, based on theory in Computational Linguistics, for semi-automatic conversion of natural language queries to formalised (semantic) queries. The aim is to improve interaction between humans and computers in question answering, especially for non-SW experts.

The paper aims to address one of the topics in the special issue - NLI, with a focus on the challenges non-SW experts face in querying different types of data. I am not convinced, however, that this is done. Quite detailed theory is presented, at a fairly high level, on CL methods for parsing natural language, but the authors never really explain what it is they do, or if the approach described is implemented or evaluated. It is therefore impossible to tell if they are actually able to support their target users. Further, considering the aims of the special issue - to support wider adoption of SW technology, especially by non-experts, I would expect some interaction with these target end users, from eliciting and validating requirements, to evaluation. Ideally, this should be done with HCI or UI experts, or at least, some evidence should be provided that indicates the application of user-centred design. However, beyond stating the well-known challenges non-experts face in using SW technology, no attempt appears to have been made to determine if the approach does result in the generation of UIs that support the target user. In fact, the paper only deals with the (underlying) technology for parsing input, NOT with the UI at all. The title and intro are therefore misleading and the paper is probably targeted at an unsuitable venue.
This is not to say that the approach may not be valid, but that the authors have not proved so. The paper is mostly a (fairly dense) discussion of CL approaches to parsing natural language. It only obliquely addresses requirements of the non-SW or non-CL domain expert end user. If the aim is to focus only on the CL contribution, then I would expect the paper to report novel methods in CL to tackle this challenge. However, this is not the case - the key contribution of the paper is therefore difficult to identify.

S4 contains no discussion, and the conclusion is simply a brief rehash of the authors' initial aims. But with no real evidence to support what is said. Finally, for this kind of journal paper, at least some sort of evaluation, should have been carried out - this would have prompted the discussion expected to conclude the paper and validated any conclusions drawn.

This paper is very difficult to follow, for three main reasons
- the presentation of related work, the theory and methodology followed is at a fairly high level, and for what appears to be a system/applications paper, without a clear mapping to actual implementation or design
- there is no clear, consistent "story" that helps the reader make connections between the different sections. Also, a number of unrelated examples are used across the paper and within even single sections, or for the latter, only part of an example, but presented inconsistently (e.g., Table 1 & text in S3.2.1).
- fairly complex language is used where simple words or terms would suffice, making it difficult to understand what is being said. Because of this there are also sections that simply do not make sense.

S2 & 3 are confounded by the use of multiple, unrelated examples. S2 is particularly difficult to follow, especially with the inconsistent and mostly unrelated examples used to illustrate the combinations. Also, whether this is a proposal or a description of actual implementation in some form that allows for testing and evaluation with real users is not clear.
Why is Nooj the choice of tool for implementation? What does it provide over other similar systems? This is especially important, as the authors describe it as a "complex NLP environment".

Further, the examples in S3.2 are in Italian - considering the target audience are required to understand only English, this needs translation. Especially since the examples in the table only overlap with those in the text. I would suggest the authors start with one clear set of examples, and use these throughout to illustrate the different aspects of parsing and analysis they carry out. This would make it easier to understand the authors' approach and assess what it provides over other existing work.

Section 3 "Experiment and Results" opens ... "Starting from this NLP theoretical and practical framework, in this project we propose to build an User Interface for KMS ..." - this is contradictory (see also above) - if a proposal, then it cannot be reported as results. Further, no actual information is provided about the UI.

************** OTHER POINTS

The authors claim to support keyboard-based or voice input - however, how either affects the formalisation of queries, or how whatever differences exist are supported, is never actually addressed.

"Nowadays, humans usually make efforts in “translating” that query into proper keywords, or even into non-acceptable1 sequences of nouns and/or adjective which they never would use in ordinary communication. ..." - examples that illustrate what is meant by "proper keywords" and other ways in which end users formulate queries would be useful. Mapping these to the examples for automatic parsing later in the paper would help the reader to assess what the added value is of this approach.

What is the API used to provide the "ideal solution" to the FST/FSA approach?

"Anyway, our approach is founded on a not statistically-based linguistic formalization which ensures a low degree of ambiguity, a low loss of meaning and an accurate matching between linguistics structures, domain concepts and programming language." - this says what it is NOT - however it is not clear what the authors actually DO.

The distinction between "computerized" and "electronic" dictionaries should be made when they are first mentioned. The content in footnote 9 probably also belongs there.
Further, saying that "All electronic dictionaries built according to LG descriptive method form the DELA System, ..." is debatable.

S3.1 is, simply, difficult to understand. Wrt the domain modeling - saying the "[OO] semantic model and its terminology are compatible with ... RDF" is redundant - that is the whole point of using ontologies. Further, what is being said in the following sentence "Actually, this ontology was already available and is constantly developed. " - is this work contributing to CIDOC? The discussion about the different levels of ontologies is contradictory, and the conclusion that follows is not obviously related to the rest of the section.

S3.4 - "deletion and reduction, which are present in sentence pairs/triples as: ..." - no examples of triples are given.

********

CITATIONS & REFERENCES

in intro - "The NLP activities sketched in this research fall inside Lexicon-Grammar (LG) theoretical and practical framework, ..." - needs a citation

FIGURES

Fig 2 is a simple linear diagram - it does not need to be represented in what at first glance appears to be a complex flow.

Fig 6 is split across two pages - they should appear on a single one - as is the reader must do extra work to relate the two parts. Further, what ontology is being referred to in E29 "Design or Procedure" class? And, in S3.4, for "Production" - which appears to be related? - it is not obvious how the examples in Figs 6 & 7 map to those terms (classes).

LANGUAGE & PRESENTATION

Too much information is placed in footnotes - they should be used only to provide additional information, or, for example, to provide links to more info.

"This aim seems easy to obtain, but the first trial, not yet surmounted, is to digit a query ..." - digit here is incorrect - is this meant to be digitise - even that, while not incorrect, is unusual. I suspect what is being said is the conversion to electronic form?

"1.1. Background
For several years, we will see that similar projects ..." - future tense is incorrect here - the section is presenting existing work.

ALL acronyms and abbreviations should be expanded at first use, in the main text - this is done only in some instances.

A large number of grammatical errors need correcting. A proof-read may help to improve readability.

Review #2
By Jacco van Ossenbruggen submitted on 28/Aug/2013
Suggestion:
Reject
Review Comment:

This paper describes a NLP framework and discusses some NLP problems to illustrate the workings of the framework.

While the paper contains a "Experiment and Results" section, I am not sure what the experiment is that has been conducted, what the results are and how they have been evaluated. It remains unclear what the research question is that the authors seek to answer.
It remains unclear what corpora have been used or how this could be replicated.

Additionally, the paper is mainly about the backend of the system, the NLP engine "under the hood". How the interface looks like, for which tasks it suited or how real users react to it when using the system remains unclear.

To properly evaluate there system on a UI level, the authors need to define a task and ask a number of users to perform that task on their system and on some state of the art baseline and compare the results (time to complete, error rate, SUS etc), and report on the outcome of such an experiment.

Alternatively, the authors can choose to evaluate the backend system, assume some user model, and evaluate their performance on some established IR, IE or QA benchmark, and report on the outcome of such an evaluation.

Minor:
Your paper is full of bold claims that are often both unnecessary and not backed up by citations. For example:

Very first sentence: "Building natural language interfaces (NLIs) is not only answering questions on the basis of a given database or knowledge base, but also accessing struc- tured data in the form of ontologies and unstructured data." This quite a statement and it needs to backed up by citations.

Idem for: "In order to achieve effective IR and IE results, any KM system, whether closed or open (i.e. the World Wide Web), could avoid most of the noise if it worked with onto- logies developed taking into account syntactic, lexic- al and semantic rules (under W3C criteria); or also, if it could be linked to data and document repositories of to extract proper and updated information (IST, or Information Storage Techniques), therefore making the Web more semantic." you need to prove this by appropriate literature references. Most people in IR know that is extremely difficult to increase precision recall by using ontologies in most domains...

Also: avoid rhetoric in sentences like "Of course, the first problem in the Multiword Units (MWU) treatment is the identification of strings of words properly representing strings of “words related to each other”."

The paper needs checking by a native English speaker. For example many articles (a, the) are missing (even in the title ...)

Review #3
By Prateek Jain submitted on 29/Aug/2013
Suggestion:
Reject
Review Comment:

The work 'Natural Language-based User Interface for Knowledge Management System. A Computational Linguistics Approach' discusses a methodology to convert natural language based queries into formal queries. The work aims to identifying mapping of terms to terms in a knowledge base and constructing queries in SeRQL over them.

The idea of converting Natural Language (NL) queries to formal queries in SQL, SPARQL or SeRQL is not new and many researchers in the past have contributed to this field. For example citations (1,3,8) from the paper.

The approach utilized by the authors lacks novelty as it involves converting sentences to triples, identifying entities from knowledge base, converting them to a graph structure for query language. To be precise, the steps going from identifying entities from knowledge base to actual queries is not clear from the paper and I am curious to know how queries beyond Select * will be constructed by the approach.

There is no evaluation of the work. I would imagine a work of this nature should provide if not all atleast one of (1) at the least some kind of precision recall measure using some of the benchmarking datasets (2) Comparative evaluation of different systems if possible using precision and recall as metric (3) If 2 is not possible at all, some kind of qualitative evaluation. It is difficult to judge the novelty or benefit of the approach in the absence of these results.

It is really difficult to read and understand the paper. The paper is very poorly written with grammatical errors and typos such as

1. users'
2. more wide perspective
3. For several years, we will see that similar projects ....
4. Our approach is founded on a not statistically...
5. an User Interface

Some statements are hard to understand, for example last line Para 1 of the Abstract - "For the creation of NLP applications finalized to the achievement of QA.

"This aim seems easy to obtain, but the first trial, not yet surmounted, is to digit a query.."