From hyperlinks to Semantic Web properties using Open Knowledge Extraction

Tracking #: 908-2119

Authors: 
Valentina Presutti
Andrea Giovanni Nuzzolese
Sergio Consoli
Diego Reforgiato Recupero
Aldo Gangemi

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

<
Submission type: 
Full Paper
Abstract: 
Open information extraction approaches are useful but insufficient alone for populating the Web with machine readable information as their results are not directly linkable to, and immediately reusable from, other Linked Data sources. This work proposes a novel Open Knowledge Extraction approach that performs unsupervised, open domain, and abstractive knowledge extraction from text for producing directly usable machine readable information. The method is based on the hypothesis that hyperlinks (either created by humans or knowledge extraction tools) provide a pragmatic trace of semantic relations between two entities, and that such semantic relations, their subjects and objects, can be revealed by processing their linguistic traces (i.e. the sentences that embed the hyperlinks) and formalised as Semantic Web triples and ontology axioms. Experimental evaluations conducted with the help of crowdsourcing confirm this hypothesis showing very high performances. A demo of Open Knowledge Extraction at http://wit.istc.cnr.it/stlab-tools/legalo.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Yingjie Hu submitted on 20/Jan/2015
Suggestion:
Minor Revision
Review Comment:

This paper presents an approach for open knowledge extraction, in which semantic relations have been constructed from hyperlinks. This approach uses a frame-based formal representation of unstructured texts, and has two major functions: evaluating the existence of relations as well as generating labels for the relations. An online service, Legalo, has been implemented, and the authors evaluate this approach based on crowdsourcing.

Merits of the paper:
1. Assigning proper labels to predicates is a difficult task. Even for specialized ontology engineers, finding a proper label takes time and efforts. This work proposes to combine extractive and abstractive approaches to automatically generate the labels based on a graph representation of the sentences. Although there are still limitations, I consider this approach as an important step for advancing automatic triple extraction from natural language texts.
2. The method descriptions, from frame-based representation to relation assessment to label generation, are presented in a detailed manner. The definitions, axioms, and rules are clear, which helps to enhance the reproducibility of this work.
3. The authors present several useful online demos, which help the readers to intuitively explore and examine the result of this research.

My major concerns of this paper are in the evaluation section and its presentation:

1. How many different human participants have been employed for the evaluation experiment? While the authors have mentioned in different places of the text about the least number of persons for a task, such as "Each question was performed by at least three workers", we still don't know how many different participants are in the entire experiment. Thus, it would be difficult to assess the significance of the evaluation result.

2. For the evaluation of hypothesis 1, the authors state that the recall is always 1 because "Legalo always provides either a true or false value and raters can only answer “yes” or “no”.
In other words, for this task there can not be neither true negatives or false negatives." It is difficult for me to understand why this happens. For me, a recall of 1.0 indicates that whenever the result from human participant is "yes", Legalo will always return that a relation exists. However, why the case, that when human participant answer "yes" and Legalo returns no relation, cannot happen (in such a case, the recall is no longer 1)? The authors may need to provide more explanation on this.

3. For the evaluation on task 3 and 4, it is unclear how the authors calculate values for precision and recall. If I understand correctly, in both tasks, Legalo generates a label which is then evaluated by human participants who choose among "agree", "partly agree", and "disagree". Then how is precision and recall calculated under this case? The authors may need to provide more explanation.

For the presentation of this paper, there are several issues as well:

1. This paper has too many sections, and some sections could be merged. For example, section 3 can be merged with section 4 and 5, with the semantic sources used for implementation (like WiBi and Watson) merged to section 4, and the evaluation dataset merged with section 5. Such an organization can also help readers track the content, and potentially reduce the length of the paper. For example, when beginning section 5, one may find it difficult to remember the two evaluation datasets discussed in section 3. As a result, the authors have to repeat some description of the dataset in section 5 again. By merging the sections, the authors only need to explain the datasets once.

2. Too many footnotes have been used in this paper (74 in total). While using some footnotes can help explain the content, too many footnotes can confuse the reader. I noticed that some footnotes are redundant: for example, footnotes 14, 32, and 44 all point to the source of the experimental dataset, while footnotes 13 and 72 both point to the online demo. There are also some explanation footnotes which can be merged into the text, such as 18, 21, and 22.

3. The paper also contains a number of repeating sentences. For example, when talking about "Legalo", the authors tend to explain repeatedly that it is "the current implementation of OKE". It might not be necessary to explain it multiple times.

4. In the legalo prototype, there is a lengthy description about FRED. While it is helpful for the readers to grasp some idea about how FRED works, too many details are unnecessary since FRED is not the major contribution of this work.

There are also some typos in the paper:

1. page 16: "In addition, this components implements two more modules: the “Property matcher” and the “Formaliser”." should be "In addition, this component implements..."

2. Also on page 16: "It depends on Legalo has core component and specialise it with two additional features..." should be "It depends on Legalo as core component and specialises it ..."

3. Hypothesis 2 in section 5.1: "Legalo is able to generate a usable predicate λ for a relevant
relation φ s between to entities, ..." should be "... between two entities..." and "λ" should be "λ'".

4. page 22:"while NELL properties result from and artificial concatenation of categories learnt automatically." should be "while NELL properties result from an artificial concatenation of..."

Other small issues:
1. the link (footnote 60) http://wit.istc.cnr.it/stlab-tools/legalo-wikipedia/ is not working.

Review #2
By Marieke van Erp submitted on 31/May/2015
Suggestion:
Major Revision
Review Comment:

The paper presents a method to extract information from web pages that takes the context around hyperlinks into account. The approach identifies sentences in which a possibly relevant relationship between two entities is identified, then NLP techniques are utilised to determine the predicate that expresses this relationship, which is then mapped to an OWL object property to link it to the LOD cloud. The paper presents ample examples of the approach and details an implementation called Legalo, My main concern with this paper is the evaluation and following from it the generalisability of the approach.

The datasets that were used in the evaluation are both made up of Wikipedia text. The advantages in using Wikipedia text is that it is generally cleaner than open domain text and many (if not most) of the entities mentioned on Wikipedia are famous enough to have a resource associated with them on DBpedia. It would be great to see some experiments on for example newswire or blog text to assess how the approach works on real open domain text. I played around with the demo a bit and it seems that the system works better on Wikipedia text than on Wikinews text, but this is only anecdotal. A system that could tackle open domain text would be super useful to the community as well as for many use cases from companies or institutions who want to improve access to their data.

In the evaluation setup, only the end-product of the Legalo system is evaluated using crowdsourcing. As annotating gold standard datasets is quite expensive, this is a nice solution. The different steps that the system performs were analysed through different tasks that were given to annotators. The Discussion section of the paper focuses on which tasks perform better than others and the inter-annotator agreements and besides for the generalisability of the predicates it does not really discuss frequently observed error types or striking examples of cases that Legalo currently cannot handle. If the entity pair selection generates an error in the span, for example it only selects a part of an entity, say "Obama", and then the system wrongly links it to "Barack Obama" where it should have been "Michelle Obama", can the source of the error be traced through this evaluation setup?

Also, why are the results in Table 6 obtained through a different evaluation setup than the other results? These results were already presented in the EKAW paper that this paper is an extension of, but the results are not linked to the results of the new version of the system. Would crowdsourcing this evaluation yield the same results? This would be an interesting way to check if the general population evaluates the results of the system in a similar manner.

As for the generalisability, the paper is an extension of "Uncovering the semantics of Wikipedia pagelinks", presented at EKAW 2014. In that paper, the Legalo system is tuned towards Wikipedia, and this paper is presented as an open domain IE system. However, there are no experiments that support this claim. From what I could gather from the demo, the distinction between the Legalo Wikipedia system and the generic Legalo system is that the generic system analyses non-Wikipedia texts. But perhaps this is a misinterpretation. It would help if the authors would devote a paragraph detailing the differences between these two systems and what the part is that FRED plays in this. As FRED in itself is already a pretty impressive tool, what exactly does Legalo add?

It seems that the paper would also benefit from a revision in the setup. I think section 3 (Data Sources) would fit better before Section 2 (Method) as to understand the method, it helps if the reader knows about the data sources that are mentioned. I also think it might be nicer if the Discussion of the results is integrated more with the results section, now the tables are presented in the results section without much explanation, and then afterwards they are discussed in Section 6. As there were 5 different evaluation tasks, it would be more coherent to discuss the results inline. I also think the "Automatic summarisation" subsection in "Related Work" could be skipped. Any IE task can serve as input for higher level tasks such as summarisation. Summarisation could be mentioned as an interesting use case but it doesn't really need the space it gets here.

Furthermore, the paper does not present many background motivations for particular choices, but many things are presented 'as is' (e.g., the choice of NELL, WiBi, VerbNet). Were any alternatives considered? Or are these not present? What are the limitations of these resources? A description of the motivations would help other resources in assessing whether these resources might be useful to them for similar problems.

In summary:
It would be good if the authors could be more specific about the contribution of Legalo vs. FRED and the Legalo Wikipedia system and if the evaluation could be expanded to non-Wikipedia text. Or the claims in the abstract and introduction should be made more specific to mention that it is only tested on Wikipedia.

Minor remarks:
The first part of the paper separates OKE and Legalo as the method and the implementation, but most of the paper further talks about Legalo. It seems this distinction is a bit artificial and confusing. Perhaps the authors can just call it the generic Legalo implementation?
Section 1:
"key bootstrap" -> not sure what is meant here
"However, most of the Web content consists of natural language text" -> it would be good if this claim is supported by a reference, there is a lot of video content uploaded each minute
"This work aims at solving" -> "This work aims to solve"
"(KE) systems address very well the task" -> "(KE) systems address the task of linking pieces of text to Semantic web entities very well"
Section 2
"On the other hand, while it is correct to state that Chouinard Art Institute .... to be considered as relevant" -> It is unclear to me what is extracted exactly by OKE here, and what would be desired. Perhaps a tabular representation of what is currently extracted and what would be desirable might clarify this paragraph somewhat.
Figure 4 is a bit too small to read
What exactly is meant by the checksum in "this is implemented in Legalo by including the checksum of s"
"to existing Linked Data. OKE method" --> to existing Linked Data. The OKE method"
Section 3
"OKE created a RDF property synthesising the link's semantics" -> "OKE creates an RDF property synthesising the semantics of the link"
Section 4
Where does "fred:Australia" come from in the last line of page 15?
Section 5
"The available datasets in this corpus are five" -> "There are five datasets available in this corpus"
"For each snippet Legalo gave always an output" -> "For each snippet, Legalo gave an output"
"Legalo produced 867 results" -> it's a bit confusing what is meant by "results" here, as results usually indicate some end product
The paragraph explaining the crowdflower setup could be shortened (righthand side of page 18)
Section 6
"Surprisingly, the average confidence value on this taks was not that low (0.59)" -> it is unclear to my why this is unsurprising. And .59 is not that very high.
"not possible to compute a standard recall" -> "not possible to compute standard recall"
Section 7
"from Legalo to NELL ontology" -> "from Legalo to the NELL ontology"

Formatting:
Some of the footnotes are repeated several times, e.g. the link to the legalo demo, the link to the relation extraction corpus. In LaTeX you can use footnote labels to refer back to a previously introduced footnotes. This would also save space.
The text is running into the margin in several places.
The example "In February 2009 Evile began the pre-production for their second album with Russ Russell" is mentioned twice, but the text mentions that the DBpedia entity dbpedia:Evil is identified here. Is this an error in the text or an error in the module?
Check the whitespace around footnote markers (sometimes there is a space between the preceding word and the footnote)
After the example at the end of the lefthand side of page 22 the textsize is smaller.
References 40 and 23: it's only necessary to cite 23 separately if you refer to the entire proceedings, in 40 it suffices to say "In Proceedings of"


Comments

It is due to refer to an additional relevant related work that defines the term "open knowledge extraction" in the context of AI, i.e. http://www.cs.jhu.edu/~vandurme/papers/VanDurmeSchubertSTEP08.pdf (Benjamin Van Durme and
Lenhart Schubert, 2008). At the time of writing this paper we did not refer to such work that will be properly acknowledged and surveyed in a possible publication version of the article. This work defines "open knowledge extraction" as "conversion of arbitrary input sentences into general world knowledge represented in a logical form possibly usable for inference", hence perfectly compatible with what defined in this paper. The cited work does not focus on Semantic Web technologies and languages, while it provides a further support to our claims and definitions.