Relationship-based Top-K Concept Retrieval for Ontology Search

Tracking #: 752-1962

Authors: 
Anila Sahar Butt
Armin Haller
Lexing Xie

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
With the recent growth of Linked Data on the Web there is an increased need for knowledge engineers to find ontologies to describe their data. Only limited work exists that addresses the problem of searching and ranking ontologies based on a given query term. In this paper we introduce DWRank, a two-staged bi-directional graph walk ranking algorithm for concepts in ontologies. We applied this algorithm on the task of searching and ranking concepts in ontologies and compare it with state-of-the-art ontology ranking models and traditional information retrieval algorithms such as PageRank and tf-idf. Our evaluation shows that DWRank significantly outperforms the best ranking models on a benchmark ontology collection for the majority of the sample queries.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] combined track accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Aug/2014
Suggestion:
[EKAW] combined track accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.
2
== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.
4
== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
3
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
4
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

Review
Please provide your textual review here.

The authors developed a method for searching and ranking ontologies for a given textual query. An (offline) index is built using hub scores of concepts within ontologies, and authority scores of the ontologies themselves.
Queries are answered using text similary calculations, ranking using the offline indexes, and two strategies for filtering.

The paper is an interesting and well written paper. Below are my main comments:
- Why was a PageRank-like algorithm used for the hubscore calculation? When measuring the centrality of a node in a network, other network analysis algorithms such as betweenness centrality may work even better
Explaining the results was a bit lacking:
- I understand the argumentation behind using artificial ontology concepts for data-type relations in the hubscore calculations. However, I would be interested in a better analysis of this. What happens with the performance with and without this particular trick. And for which type of ontologies is it more suitable?
- Other than table 1 and 2, I miss more stats on the ontology benchmark. How connected is the ontology network? What is the average/median degree? Depending on such figures, we might better understand the influence of the authority calculation of the ontologies on the ranking.
- Why is a detailed description of the precision/recall of the filter step out of the scope of this paper? I would have liked this part particularly
- I like the graphs and tables on increased performance. However, I would have easily traded the space of one of these graphs and tables for more insight. E.g., which concepts are incorrectly ranked (based on your evaluation outcome), and why? Is it the filtering step or the offline step? Is there a particular property of this concept or ontology which causes this?

Minor comments:
- 'dumping factor' (page 4) => 'damping factor'
- For reproducability reasons, I miss a link to the source code

In summary, this is a good paper. There is still room for improvement, particularly in explaining the results (possibly in a journal version)

Review #2
Anonymous submitted on 28/Aug/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

0

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

4

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

3

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

3

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

2

Review

The approach presented in the paper is interesting. However the following
comments should be taken into account.

- Introduction: The text is too focus on a summary of the approach presented.
Authors should provide more about the gap and the needs not covered in the
field.

- Section 2:
It is not clear the meaning of 'ontology corpus'; are they the candidate
ontologies?
It would be very useful to provide name/title to the phases both in the text
and in the figure. As it is now is quite confusing.
The meaning of hub score and authority score should be introduced the first
time they appear.
Figure 1 should be explained: meaning of circles, rectangles, numbers, and so
on.
Authors should explain in a better way the meaning of 'user query', as it is
now it is not clear enough. Examples could help.

- Section 3:
It is not clear enough how many central concepts can have an ontology? Could
be possible all the concepts are central?
Figure 2 should be explain the first time it is mentioned.
It is not clear enough why this part has been divided into model and execution. This part should be rewritten, possibly included a schema that shows graphically the main idea.
Authors should provide more details about the benchmark ontology collection; it is not enough with the reference provided (details about such collection should be included in the paper).

- Section 4:
Authors should clarify whether this phase must include the filtering tasks; as it is now it is not clear enough.

- Section 5:
Examples of the queries should be included.

In general, examples should be included along the explanation of the approach to benefit the reading and understanding. In addition, the explanation should be improved (this could imply a new structure for the paper).

Review #3
Anonymous submitted on 03/Sep/2014
Suggestion:
[EKAW] combined track accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

2

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

4

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

5

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

5

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Review

The paper proposes a novel way to address an important problem of knowledge engineering: the search for concept to reuse and/or link in existing ontology. The proposed approach stems from good intuitions: the more a concept is central in an ontology, the more important it is; the more an ontology uses (and is reused by) other ontologies, the more important it is; and filters can be applied to top-k matches to improve the relevance of the retrieved concept for the knowledge engineer. A minimal formalisation of those intuitions allows the authors to present in a terse way a sound method that is demonstrated to address the problem better than state-of-the-art solutions. The quality of the evaluation is among the strongest points of the paper.

The only minor negative comment is about the presentation of the experimental evaluation. The authors did not make the section self-contained. The names of the methods illustrated in Figure 4 and the meaning of the names of the sample queries illustrated in Figure 5 are explained only in their ISWC 2014 paper (which I found online at [1]). I recommend the authors to address this important issue extending this section.

[1] http://www.armin-haller.com/publications/cbrbench-iswc2014.pdf