Assessing deep learning for query expansion in domain-specific arabic information retrieval

Tracking #: 2168-3381

Wiem Lahbib
Ibrahim Bounhas
Yahya Slimani

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
Information Retrieval systems are limited because of the term mismatch issue. User queries are generally imprecise and incomplete; thus, important terms may be missing in the query. Employing classic models based on exact matching between documents and queries cannot resolve such problem. In this article, we propose to integrate domain terminologies into the Query Expansion process (QE) in order to enhance the Arabic IR results. Thus, we investigate different experimental parameters such as corpus size, the query length, the expansion method and the word representation models. In a first model’s category, we use deep learning-based models (i.e. word2vec and GloVe). In the second one, we build a cooccurrence-based probabilistic graph and compute similarities with the ranking function BM25 and we compare the results of Latent Semantic Analysis (LSA) with both of them. To evaluate our approaches, we conduct multiple experimental scenarios. All experiments are performed on a test collection called Kunuz providing domain-specific documents. This allows us to assess the impact of domain knowledge on QE. According to multiple state-of-the-art evaluation metrics, results show that incorporating domain terminologies in the QE process outperforms the same process without using terminologies. Results also show that deep learning-based QE enhances recall.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Dagmar Gromann submitted on 31/Mar/2019
Review Comment:

We appreciate the effort of the authors to address reviewers' comments. However, after three requests from different rounds of reviews to improve the quality of writing of the paper and comply with the style guide, the quality of writing is still very poor and the way that figures, tables, equations are pressed next to the text provides a horrible layout. These two points, as explicitly stated before, are enough to not recommend this paper for publication. However, there are also some futher points listed below.

Here just some examples:
- word embeddings models
- The Deep Learning helps...
- considered insufficient as for indexing or for matching
- Expansion term extraction
- 10.761 words => read as 10 comma 761 words in English
- consider this fund as a source of knowledge extraction.
- revealed by the Table 3
- was not chosen arbitrary
- consist on simple nominal entities
- and many more

There is also a range of open/now aggravated problems:
- when saying that embeddings cannot really be considered deep neural networks because they simply are not deep, renaming it to deep learning does not improve anything -> calling the section deep learning embeddings just makes things a lot worse
- the terminology extraction and creation process is still unclear - Section 4.1. states that the data from an already provided dataset described in Section 5.2. are utilized but then in the rest of the paper, Section 4.1. is referenced for the terminology creation method; In fact it goes from "terminologies must exist", we use the existing dataset, "As a result, we obtain 97 terminologies"; then in the end there is some obscure discussion about reference terminologies
- the whole argumentation about the good quality of the terminologies seems to build on a quality assurance by one single expert; having one person check 97 terminologies is no valid quality assurance
- embeddings do not generate similarity values "thanks to deep learning vectors"
- use of gender not consistent - the very first sentence refers to his when meaning user