An ontology-based approach for semantic ranking of the web search engines results

Paper Title: 
An ontology-based approach for semantic ranking of the web search engines results
Abdelkrim Bouramoul, Mohamed-Khireddine Kholladi, and Bich-Liên Doan
This work falls in the areas of information retrieval and semantic web, and aims to improve the evaluation of web search tools. Indeed, the huge number of information on the web as well as the growth of new inexperienced users creates new challenges for information retrieval; certainly the current search engines (such as Google, Bing and Yahoo) offer an efficient way to browse the web content. However, this type of tool does not take into account the semantic driven by the query terms and document words. This paper proposes a new semantic based approach for the evaluation of information retrieval systems; the goal is to increase the selectivity of search tools and to improve how these tools are evaluated. The test of the proposed approach for the evaluation of search engines has proved its applicability to real search tools. The results showed that semantic evaluation is a promising way to improve the performance and behavior of search engines as well as the relevance of the results that they return.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 

Solicited review by Laura Hollink:

This paper describes a system that uses WordNet to re-rank search engine results. While the paper is well writen and entirely in the scope of the journal, I regret to say that I cannot recommend publication.

The system takes a user query and expands it to related words: both synonyms and hyponyms in WordNet. A standard Information Retrieval approach (the vector space model) is used to rerank the top 20 results of 3 search engines using this expanded query. A small study suggests that the reranked results are better than the original ranking of the search engine results.

The study raises a number of questions, both about how it is set up and about the interpretation of the results:

-Who did the relevance assessment of the documents for the 25 queries? What was the inter-rater agreement?
-How were the relevance assessments done? E.g. on a binary or a graded scale? How did you aggregate the relevance scores of the ranked documents list?
-What is the difference between what you call effectiveness (Table 2) and relevance (Table 3)?
-Are the differences that you present statistically significant? You mention that the difference between Google and Yahoo are not significant. Did you test whether the differences between the classical ranking and your proposed ranking are significant?
-Please provide the reader with more information about the data that you gathered. Since there are only 25 queries, it is feasible to present a table with the score for each query, instead of just an average over all queries. Also, it would be good to elaborate on a few example cases (query + documents) where your system ranks documents differently than existing search engines.
-You show that your approach outperforms Google and Yahoo on "Rate of the not dead links" and "Rate of the not parasites pages". However, there is no description in the paper of how your system handles dead links and parasite pages. Also, while you present these results as "advance of the semantic ranking compared one to classical", I don't see how semantics have anything to do with this.
-Why are the results for Bing not shown?

The related work section is very brief and a lot of relevant work is missing. E.g.
-Query expansion using lexical-semantic relations by EM Voorhees
-Multimedian e-culture demonstrator (2006) by Guus Schreiber et al.
-Experiments on using semantic distances between words in image caption retrieval by AF Smeaton et al.
The issue of word-sense disambiguation, which has a big inpact on the success of query expansion, is not mentioned at all.

The authors motivate their choice to use WordNet, but the rationale is not entirely correct. They say that domain ontologies in the medical or geographical fields are not available on the internet. Please have a look at over 50 medical and geographical datasets available as linked open data on

The presentation of the paper is good. I do think, however, that a large part of section 3 can be left out or replaced with a reference to literature or a text book on Information Retrieval: in my opinion this holds for the sections on IR models, (boolean, vector space and probabilistic) and tf.idf. Otherwise, I have just a few small comments about the text:
-The word 'evaluation' is somewhat ambiguous here. In this paper it is used in the context of a search engine evaluating relevance of a document to a query, rather than an end-user study performed to evaluate the search engine.
- p2. "the experimental our approach" please rephrase
- p7. "classsical rinsing" should be classical ranking

Solicited review by Eetu Mäkelä:

The paper seems to present primarily an approach for re-ranking search engine results based on a vector space similarity measure that expands the query using WordNet synonyms and hypernyms and counts occurrences of the expanded query words in the returned pages.

However, this is just conjecture, as it was quite hard to make out what precisely is happening from the narrative. The English is passable, but the narrative flow is not focused enough on the core subject being presented. There are important omissions, haphazard argumentations as well as puzzling digressions throughout.

Starting with the related work section, I find a lot of relevant material missing. Particularly other works using WordNet for query expansion. Also, the presentation does not do an adequate job of actually situating the proposed method in relation to others. There should be a tighter comparison of similarities and differences. On the other hand, if such a comparison would be done with the proper related work, the novelty of this paper would probably start to seem quite thin.

The actual method description is hard to follow. Some formulas are given, but important details on how the algorithm actually uses them are left out. On the other hand, most of the modules described in section 4 in detail are of negligible import to the core algorithm, and could be vastly condensed.

All of this could be corrected with a major revision, but the most problematic part is the evaluation. The results are intriguing, seeming to point to an advantage of the proposed method with regard to current search engines. However, the results are impossible to judge properly, because the evaluation protocol isn't properly specified.

Extrapolating from the hints in the text, I'm assuming that there was some sort of human evaluation of the returned results in each case, and that the results were ranked on a scale of possibly 1-10.

But how? Earlier (if I'm reading it correctly) the text states that the top 20 hits returned by each search engine for the query are re-ranked. Now, how many of these are then evaluated by a human? Is there an emphasis based on the order or just the quality of the link? The paper does not say. Yet either some additional cutoffs or order weighting must be going on, as otherwise the top 20 links should be the same (without regard to order), and thus their average evaluation scores also the same.

Solicited review by Eero Hyvönen:

The paper aims at offering better rankings for search results, based on the concepts found in search engine results and the underlying ontologies. The method is tested on Google ja Yahoo! search engines, using WordNet as the underlying ontology.

The paper starts with a short introduction. Research questions are presented but on a very general level. More precise formulation of the IR research problem is needed here.

Next, related work is discussed. Again, the presentation is given at a very general level (presenting e.g. IR as a field, TDF-IDF etc.) without deeper analysis and enough references to literature. There's lots of previous work related to ontology-based search and guery/document expansion not covered there.

The proposed approach in section 4 is presented intuitively as a flow chart (Fig 2): the idea is to represent the query and search results form a search engine as vectors (using WordNet) and then re-rank the results in a better way than e.g. Google, Yahoo!, and Ping does. If better ranking is possible in this way, this would be useful, but convincing evidence and evaluation is needed. The method used is based on simple vector similarity comparizon, nothing paricularly novel there. A detailed description of the method would be needed in the paper, though. From a methodological viewpoint, the setting is basically similar to traditional ontology-based search on text documents: here, however, the method is applied to the search results, not the original document set.

Evaluation of the approach using a test implementation is presented in section 5. A set of 25 queries (simple and complex) to Google and Yahoo! were used with 500 results. The evaluation setting is not described in enough detail. For example: What are simple and complex queries? What was the gold standard used and how was it created? What are the "points" anyway in the tables? How reliable are the results in the general case?

As for the results, it should be clarified how good the results actually are in the first place and why: I think that the numbers in table 3 do not convince the reader that the method proposed really makes a difference.

The paper is readable and well-structured, and the topic is suitable for the journal. However, in my mind, the paper the cannot be accepted for publication due to the issues discussed above.



This paper was submitted as part of the 'Big Data: Theory and Practice' call.