Uncovering the semantics of Wikipedia wikilinks

Tracking #: 718-1928

Authors: 
Valentina Presutti
Sergio Consoli
Andrea Giovanni Nuzzolese
Diego Reforgiato Recupero
Aldo Gangemi
Ines Bannour
Haïfa Zargayouna

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
Wikipedia pagelinks, i.e. links between Wikipages, carry an intended semantics: they indicate the existence of a factual relation between the DBpedia entity referenced to by the source Wikipage, and the DBpedia entity referenced to by the target Wikipage of the link. These relations are represented in DBpedia as triple occurrences of a generic ”wikiPageWikilinks” property. We designed and implemented a novel method for uncovering the intended semantics of pagelinks, and represent them as semantic relations. In this paper, we experiment our method on a subset of Wikipedia showing its potential impact on DBpedia enrichment.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] combined track accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/Aug/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 0 borderline paper

Reviewer's confidence
Select your choice from the options below and write its number below.

== 4 (high)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 4 good

Novelty
Select your choice from the options below and write its number below.

== 3 fair

Technical quality
Select your choice from the options below and write its number below.

== 3 fair

Evaluation
Select your choice from the options below and write its number below.

== 2 poor

Clarity and presentation
Select your choice from the options below and write its number below.

== 2 poor

Review
Please provide your textual review here.

The paper describes a system for assigning types to Wikipedia pagelinks.
The proposed system has multiple stages and is similar to the system proposed in their earlier work on the “automatic typing of DBPedia entities”. The approach first extract subject/object relations from Wikipedia and then constructs their FRED graphs. These graphs are accordingly leveraged for obtaining the binary semantic relations among entities. Two evaluations are performed in order to assess the the quality of assigned pagelinks and the proposed alignments to existing semantic web properties.

[General comments]

The main concern is about the evaluation. Firstly, the proposed approach has not been compared to any other other approach on relation extraction or related areas.
Additionally, it is not clear to me how the human judgments using the Likert scale in Table 6 were evaluated using Precision/Recall.
One more point is about the coverage of the approach: In section 5 it is mentioned that of the 1192 sentences, FRED graphs could be generated for 629 only. Does that mean that your link typing approach has a recall of about 0.5 as it is dependent on the FRED graph generation stage?

[Other comments]

- How many relations types are there in total?
- FRED requires more explanation on how the graph is constructed.
- Too many unnecessary newlines throughout the paper. For instance, in Section 4 under Fig. 2, there is no need for dividing the text into those small paragraphs.
- Related work: “[...], while Legalo focuses on unrevealing the semantics [...]”, revealing?

Review #2
Anonymous submitted on 22/Aug/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.
1
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.
3
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Review
This paper presents a workflow for adding semantics to Wikipedia links. Instead of using a general "wikiPageWikilinks", the proposed workflow converts the pagelinks between Wikipedia entities into semantic triples. This work has the potential to further enrich the structured content of DBpedia, and is presented in a logical manner. However, some revisions are necessary to improve the work.

1. An important issue lies in the example given by the authors in the introduction section. The authors state that the link between "Paris" and "French Open" is under a general category of "dbpo:wikiPageWikiLink". However, this is not the case in the current DBpedia Paris page (see http://dbpedia.org/page/Paris). In this page, the entity of "Paris" has been linked to "French Open" through the relation of "is dbpedia-owl:location of" (In fact, there are entities of "dbpedia:1979_French_Open", "dbpedia:1981_French_Open", "dbpedia:1993_French_Open", and so forth). I even couldn't find the property called "wikiPageWikiLink", but only "wikiPageExternalLink". This could be due to recent updates of DBpedia, and I believe it should not be used as a reason to deny the authors' good work, since this workflow can also be applied to other general pagelinks. However, I think it is very important to change the introduction text accordingly; otherwise, a reader may get confused by the motivation of the paper. A question that a confused reader may directly ask is: why are you still doing this work since the DBpedia links are already assigned semantics? Thus, it would be helpful if the paper can address this issue, and emphasize the general application of this research in the introduction.

2. The presented work heavily relies on the authors' previous work FRED. However, the link to FRED cannot be accessed. I tried to visit the site on two separate days, but didn't get any luck on either of the two days. It would be good if the authors can ensure a running page of FRED, so that interested readers can check it.

3. While the links to Waston and LOV are accessible, the link to NELL also can't be visited. I tried twice in two separate days as I did for the FRED link.

4. The url to the experiment result http://isotta.cs.unibo.it:9191/sparql is not accessible as well, which makes it difficult to see the experiment result.

5. In the evaluation section, the authors used precision, recall, F-measure, and Kendall's W to evaluate the accuracy of the generated relations. Likert scales have been used to ask three users to evaluate the quality of the generated relations. However, it is not clear how these indicators (e.g., precision or recall) are calculated. Are the authors calculating the precision based on the number of "strongly agree" divided by the total number of relations? How about recall, and what are the all relevant relations which should be used as the denominator for the calculation. Without a clear description on the calculation process, it would be difficult to understand the experiment results.

To sum up, while this paper has several not working links and some issues, there exist good values in the presented workflow, which deserves a publication.

Review #3
Anonymous submitted on 24/Aug/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject
2

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)
3

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
5

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
2

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present
2

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4

Review
Please provide your textual review here.
The paper deals with transforming graphs representing the semantic content of
event-describing wikipedia sentences into binary relations and their automatic
labeling. The semantic representation is produced by an existing system. The
implemented Legalo system particularly focuses on links between the entity
described by a wikipedia page and other entities that are linked from the
page. Results are evaluated in a user-based study.
The reported work presents an incremental step in the development of the
system -- a significant portion of the paper links together pre-existing
results, while the newly added property extractor and the property matcher
employ rather straighforward strategies (such as the edit distance).

The evaluation focuses strictly on the cases that could be analysed by the
employed system. The discussion does not pay sufficient attention to this fact
so that the results need to be interpreted with this limitation in mind.

The paper would benefit from more (better) examples, e.g., it is not fully
clear, what links are classified as non-sense and loose relations. It would be
also useful to connect the way in which a sentence is structured with the
relevance of extracted relations. For example, it is questionable how useful
it is to link "Cobb" (rather than the act of filming Conan the Barbarian) and
"Spain" with relation "locatedIn" based on the sentence:
"While Cobb was in Spain working on Conan the Barbarian, Spielberg supervised
the rewrite into the more personal E.T. and ended up directing it himself."
Judgment criteria should be also supplemented with examples, otherwise, it is
not clear how to interpret the results.