Natural Language Generation in the context of the Semantic Web

Nadjet Bouayad-Agha
Gerard Casamayor
Leo Wanner

Philipp Cimiano

Survey Article
Natural Language Generation (NLG) is concerned with transforming some formal content input into a natural language output, given some communicative goal. Although this input has taken many forms and representations over the years, it is the semantic/conceptual representations that have always been considered as the ``natural'' starting ground for NLG. Therefore, it is natural that the semantic web, with its machine-processable representation of information with explicitly defined semantics, has attracted the interest of NLG practitioners from early on. We attempt to provide an overview of the main paradigms of NLG from SW data, emphasizing how the Semantic Web provides opportunities for the NLG community to improve their state-of-the-art approaches whilst bringing about challenges that need to be addressed before we can speak of a real symbiosis between NLG and the Semantic Web.
By Ion Androutsopoulos submitted on 10/May/2013
The article has been improved significantly since the original submission. It is now shorter and, more importantly, it provides a higher-level and more coherent view of relevant key ideas, reporting fewer details about particular systems. I felt that the authors have done a good work to address the points I raised. There are still a few points where I believe clarifications and some more examples are needed (see below), but these can probably be viewed as minor revisions. I recommend accepting the article, provided that the following points are addressed.

More detailed points:

The abstract uses the acronym “SW”, without defining it. It also uses both “semantic web” and “Semantic Web”. Please check the consistency of the abstract.

Section 2.1, paragraph 1, sentence “The context can be… wearable device).”: This sentence is too long and difficult to follow. Please replace it by simpler sentences. Also, “or allow” should probably be “or it may allow”; “vs” should probably be “vs.”; and I could not understand at all “for an advice or narration generation wearable device”.

Footnote 1: Why is a linguistically independent conceptual structure necessarily domain- and task-independent? Why cannot a linguistically motivated conceptual representation be domain-dependent? Also, please clarify that the “task” is NLG; this becomes clearer in Fig. 1, but it would be better to clarify earlier.

Section 2.1, paragraph 3: “Dialogue” is not an “information processing application”. Maybe use “dialogue system” instead. Also, capitalizing “Dialogue”, “Summarization” etc. looks unnecessary to me.

Figure 1, output, modality: “Textual only (written or spoken)” should probably be replaced by “Language only (written or spoken)”.

Section 2.1, page 3, second paragraph below Fig. 1: Is GRE part of (1), (2), or (3)?

Footnote 4: I would recommend including this example in the main text.

Section 2.1, last paragraph, page 4: “McKeown’s” should be “McKeown”.

Section 2.2, first paragraph: Is GRE also “semantically oriented NLG task” or not, and why?

Section 2.2.1, first paragraph: “level. ].” should be “level.”

Section 2.2.1 and following sections: Please explain why the terms “closed planning” and “open planning” are used. What is “closed” in “closed planning”? What is “open” in “open planning”? Also, paragraphs 3 and 4 of this section provide more information on “open planning”, but no additional information is provided on “closed planning”, which leaves the reader wondering exactly what “closed planning is”.

Footnote 7: “section 2.1” should probably be “Section 2.1”.

Section 2.2.2, first paragraph: “120])” should be “120]”. Also, a comma seems to be missing immediately after “[70, 92, 104]”.

Section 2.2.3, second paragraph, page 5: Please explain briefly what a “discrimination network” is. Also, how do we reach the conclusion that the concept of the example should be realized as `drink’? Should “lexical unit Ingest” be “lexical unit drink”?

Section 2.2.3, third paragraph, page 5: This paragraph is unclear. Also, why are the graph-rewriting and structure mapping approaches “especially relevant”? An example would help.

Section 2.2.3, fourth and fifth paragraphs: These paragraphs are also unclear. Please explain more clearly, ideally also providing examples.

Section 2.2.3, last paragraph, page 5: Please provide some concrete examples demonstrating the use of the Upper Model. Also, the right justification of this paragraph needs fixing.

Section 3, paragraph 1: “Sevcenko’s” should be “Sevcenko” or “Sevcenki” (reference [123] says “Sevcenki”).

Footnote 9, “for presenting … into queries [17]”: I could not parse this sentence. Also, “contents” in the following sentences should probably be “content”.

Section 3.1, first paragraph: space missing in “say(almost)”.

Section 3.1, second paragraph: What does “pondering the traversed nodes mean”? This paragraph is particularly unclear. An example might help.

Section 3.1.1, last paragraph, page 8: Please explain what “all the queries of a term” means. Also, please explain more clearly the approach of Ang et al.

Section 3.1.2, last paragraph, last sentence: I was unable to understand this example. Why would the feedback text about the animal be presented in the first place, if the user specified that Mary owns a pet? Also, should the right bracket after “pet” be moved at the end of the previous sentence?

Section 3.1.3, first paragraph: “Bouttaz et al.’s”should be “Bouttaz et al.”.

Figure 2: Is this illustration copied from another article? Any copyright issues? Also, the illustration appears blurred.

Section 3.1.4, line 1: The use of “i.e.,” here does not seem appropriate.

Section 3.1.4, paragraph 1: This paragraph is particularly unclear. For example, what is “heuristic-based navigation”? What are the “false implicatures” mentioned? What does “selects axioms if their selection can be inferred from axioms already selected” mean? Please add examples.

Section 3.2: The use of “Consenus Model” seems misleading to me. It implies that some consensus approach to sentence planning in NLG for SW has emerged, which I believe is incorrect.

Section 3.2.1, first paragraph: Remove indentation before “can be verbalized”. Also, please explain what “patterns” means here.

Section 3.2.1, second paragraph: “support user’s creation” does not look fluent to me.

Figure 3: I doubt that a reader not familiar with NaturalOWL’s linguistic resources will be able to understand this figure. Please explain, for example, what “owner”, “filler”, “retype”, “re_auto” mean.

Section 3.2.2, last paragraph: What are the “theoretical concerns”? This sentence and the previous discussion seem to suggest (I believe wrongly) that annotating the ontology with linguistic resources is theoretically less principled, compared to the “Consensus Model” (which blurs the distinction between knowledge representation and linguistic resources, e.g., by using class and property identifiers as lexicon entries and sentence templates).

Section 3.2.3, first paragraph: Again, please provide (here or elsewhere) some concrete examples showing the benefits of using an Upper Model.

Section 4, paragraph 2: Please clarify what “NLG-related (i.e., communication) knowledge and domain communication knowledge” means.

Section 4, page 12, second bullet: Please clarify what “apply domain-specific rules… knowledge” means.

Section 4.5, last paragraph, “We are convinced… context”: The previous discussion does not seem to provide any particularly strong support for this statement.

Anonymous submitted on 04/Jun/2013
The paper now does an excellent job of reviewing the field of NLG with
particular reference to the representations and needs of the Semantic
Web. The re-organisation has been very successful I think in bringing
about a solid overview of methods, approaches and issues. I am happy
to recommend its publication in this form apart from some very small
typos/corrections to be made to the text as listed below.

* reference to footnotes should be checked for consistency; at the ends of
sentences all possible varieties of "text space footnote .", "text
. space footnote", "text . footnote", "text footnote ." occur: these
should be corrected; my preference would be "text . footnote"
throughout, no spaces.

* p3, 2.2.1, there is a spurious closing square bracket at the end
of the first sentence

* p3, n6. "we did not think it deserved to be discussed in this
section" is much too evaluative; replace with "we do not discuss it

* p4, last paragraph of 2.2.1: the grammar is wrong here, looks like
a missing verb (perhaps an "are" before "used" was meant?)

* p4, 2.2.2 first para:

"discourse structure of the content" --> "discourse structure for
the content" [because content does not have discourse structure

"to the already selected nodes" --> "to those nodes already

* p6, first para:

"predates" --> "predate"
"Serenko's [123]" --> "Serenko [123]"

* p6, second para: "attuned" -> "tuned"

* p6, n9: "users preferences" --> "users' preferences"

* p6, 3.1, first sentence: "say(almost)" --> "say (almost)"

* p6, 3.1, second para: the problem remarked by one of the reviewers
in the first review was probably due to the choice of the word
rather than anything else -- "pondering" just does not belong here
(suggesting a prolongered, lingering, doubtful and uncertain
consideration of issues and options... something that I am not yet
willing to attribute to any of our 'intelligent' systems!), replace
with, e.g., "considering"

* p6, 3.1, last sentence: 'outstanding' means 'excellent' which is
probably not what you mean; perhaps 'significant', 'representative',
'particularly prominent'? pick one.

* p8, l.5: "if at all there" --> "if there at all"

* p8, 3.1.2, first sentence: "a term coin" --> "a term coined"

* p9, middle left column: put space between O'Donnell and reference

* p9, right column, top: "some patterns which can be ... each." -->
"some patterns, each of which can be linguistically realized as a

* p10, 3.2.1 after the OWL restriction example: \noindent missing in
following textblock

* p10, middle righthand column, replace the "<->" by a proper double
headed arrow

* p11, last sentence 3.2.2: "limit" --> "limits"

* p11, righthand column, bottom: wrong hyphenation in "Geoname", need

* p 12, second para: "vast amounts of ... documents" --> "vast numbers
of documents"

* p14, top(-ish) of lefthand column: remove spurious comma after
"combined with"

* p14, bottom righthand column: SW and NLG "community" -->