Abstract:
The ubiquity of disinformation on digital platforms poses a threat to democracy and social cohesion. Despite significant developments in machine learning for disinformation detection and more specific related tasks (such as fact-checking, check-worthiness detection, claim linking, propaganda and rumor detection), effectively applying empirical knowledge during the training of such models in a standardized and transparent way remains a challenge. In this paper, following the semantic web principles, we propose TAXODIS---the first of its kind openly available Taxonomy of Online Disinformation. It structures an interdisciplinary set of well-defined and analyzed linguistic features of online disinformation discourse and is meant to help annotate training data to nourish machine learning and computational models that deal with the above-mentioned tasks. The systematic clustering of linguistic features into a comprehensive and publicly available framework provides a basis for the empirically grounded training of models and enhances the understanding of disinformation on a textual and linguistic level. Demonstrating and evaluating the artifact, we find that it facilitates data labeling processes by offering annotators a compact yet empirically informed guide to identifying textual indicators of disinformation. This paper, proposing a structured taxonomy as a valuable tool for automated detection systems, contributes to disinformation detection by mapping nuanced linguistic characteristics in disinformation content.