Abstract:
Social networks have become information dissemination channels, where announcements are posted frequently; they also serve as frameworks for debates in various areas (e.g., scientific, political, and social). In particular, in the health area, social networks represent a channel to communicate and disseminate novel treatments' success; they also allow ordinary people to express their concerns about a disease or disorder. As a response, the Artificial Intelligence (AI) community has developed analytical methods to uncover and predict patterns from the posts that enable to explain news about a particular topic, e.g., mental disorders expressed as eating disorders or depression. Albeit potentially rich while expressing an idea or concern, posts are presented as short texts, preventing, thus, AI model from accurately encoding these posts' contextual knowledge. We propose a hybrid approach where knowledge encoded in a community maintained knowledge graphs (e.g., Wikidata) is combined with deep learning to categorise social media posts using existing classification models. The proposed approach resorts to state-of-the-art named entity recognizers and linkers (e.g., FALCON 2.0 and EntityLinker in spaCy Python library) to extract entities in short posts and link them to concepts in knowledge graphs (e.g., Wikidata). Then, knowledge graph embeddings (e.g., RDF2Vec) are utilised to compute latent representations of the extracted entities, which result in a vector representation of the posts that encode these entities' contextual knowledge extracted from the knowledge graphs. These knowledge graph embeddings are combined with contextualized word embeddings (e.g., BERT) to generate a context-based representation of the posts that empower prediction models. We apply our proposed approach in the health domain to detect whether a publication is related to an eating disorder (e.g., anorexia or bulimia) and uncover concepts within the discourse that could help healthcare providers prevent and diagnose this type of mental disorder. We evaluate our approach on a dataset composed of 2,000 short texts related to eating disorders. Our experimental results suggest that using knowledge graph exploitation, the semantic enrichment of these messages increases the reliability of the predictive models generated concerning models that do not use the knowledge collected from Wikidata. The ambition is that the proposed method can support health domain experts in discovering patterns that may forecast a mental disorder, enhancing early detection and more precise diagnosis towards personalised medicine.