Separability and Its Approximations in Ontology-based Data Management

Tracking #: 3391-4605

Gianluca Cima
Federico Croce
Maurizio Lenzerini1

Responsible editor: 
Guest Editors Ontologies in XAI

Submission type: 
Full Paper
Given two datasets, i.e., two sets of tuples of constants, representing positive and negative examples, logical separability is the reasoning task of finding a formula in a certain target query language that separates them. As already pointed out in previous works, this task turns out to be relevant in several application scenarios such as concept learning and generating referring expressions. Besides, if we think of the input datasets of positive and negative examples as composed of tuples of constants classified, respectively, positively and negatively by a black-box model, then the separating formula can be used to provide global post-hoc explanations of such a model. In this paper, we study the separability task in the context of Ontology-based Data Management (OBDM), in which a domain ontology provides a high-level, logic-based specification of a domain of interest, semantically linked through suitable mapping assertions to the data source layer of an information system. Since a formula that properly separates (proper separation) two input datasets does not always exist, our first contribution is to propose (best) approximations of the proper separation, called (minimally) complete and (maximally) sound separations. We do this by presenting a general framework for separability in OBDM. Then, in a scenario that uses by far the most popular languages for the OBDM paradigm, our second contribution is a comprehensive study of three natural computational problems associated with the framework, namely Verification (check whether a given formula is a proper, complete, or sound separation of two given datasets), Existence (check whether a proper, or best approximated separation of two given datasets exists at all), and Computation (compute any proper, or any best approximated separation of two given datasets).
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Mar/2023
Review Comment:

The authors revised their paper taking into account most of my comments. I think the paper has improved a lot.

Let me stress, however, that I still find the naming scheme VERY unfortunate. I agree that "separating query" is standard, but a sound/complete separation that is not a separation is misleading (even if the authors make an effort to explain it well from the beginning): if you add an adjective to a term, the resulting term should be more specific, e.g., a deterministic Turing machine is a Turing machine that satisfies an additional condition. In this case, it is especially confusing since the word "separation" involves by definition two entities (the two entities to be separated), an aspect that is lost when you go to proper/sound separation. I highly recommend to look for a different terminology in future papers on the topic - the present terminology really should not become the standard.