Multi-label categorizations and question answering for imbalanced raw web-based data

Tracking #: 2025-3238

This paper is currently under review
Julien Lacombe
Rémy Chaput
Julien Perier-Camby
Feras Al Kassar
Marc Bertin
Frédéric Armetta

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
The endless vastness of the Internet makes it necessary to offer extraction and classification tools that are essential for a good design and use of tomorrow's applications. A challenging task is to predict labels for a large raw web-based data. An other challenging task is related to the general question answering problem or how to freely question a system. Machine learning techniques and neural networks can address theses problems individually, we introduce an extension of memory networks to tackle these problems together. We show that our proposal is a pragmatic way to increase the set of questions the network can answer with multi-label predictions, and find numerous applications. The proposed approach appears to be competitive when compared with the top-ten classifiers dedicated to multi-label categorization. Experimental results show that efficiency of predictions remain constant when applying multi-question tasks on the same network. Web datasets are usually imbalanced and costly to acquire, which can deteriorate the quality of predictions. Results and parameters are discussed in relation to the rarity of data and how the internal representation can be used to improve the efficiency of multi-label categorization.
Full PDF Version: 
Under Review