CAMS-KG: a Classical Arabic Morpho-Semantic Knowledge Graph

Tracking #: 2065-3278

This paper is currently under review
Ibrahim Bounhas
Nadia Soudani
Yahya Slimani

Responsible editor: 
Guest Editors Knowledge Graphs 2018

Submission type: 
Full Paper
In this paper we propose to build a morpho-semantic knowledge graph from Arabic vocalized corpora. Our work focuses on classical Arabic as it has not been deeply investigated in related work. We use a tool suite which allows analyzing and disambiguating Arabic texts, taking into account short diacritics to reduce ambiguities. At the morphological level, we combine Ghwanmeh stemmer and MADAMIRA which are adapted to extract a multi-level lexicon from Arabic vocalized corpora. At the semantic level, we infer semantic dependencies between tokens by exploiting contextual knowledge extracted by a concordancer. Both morphological and semantic links are represented through compressed graphs, which are accessed through lazy methods. These graphs are mined using measure BM25 to compute one-to-many similarity. Indeed, we propose to evaluate CAMS-KG in the context of Arabic Information Retrieval (IR). Several scenarios of document indexing and query expansion are assessed. That is, we vary indexing units for Arabic IR based on different levels of morphological knowledge, a challenging issue which is not yet resolved in related work. We also experiment several combinations of morpho-semantic query expansion. This permits to validate our resource and to study its impact on IR based on state-of-the art evaluation metrics. Keywords: Morpho-semantic knowledge extraction, Classical Arabic text mining, Arabic information retrieval, graph-based knowledge representation.
Full PDF Version: 
Under Review