Ontology evolution from RDF streams using possibilistic axiom scoring

Tracking #: 3827-5041

Authors: 
Alda Canito
Jérôme David
Juan M. Corchado
Goreti Marreiros

Responsible editor: 
Aidan Hogan

Submission type: 
Full Paper
Abstract: 
Evolving an ontology involves re-learning, re-enriching and re-validating knowledge in the face of changes to the domain, and techniques applied for them can be adapted to ontology evolution. The possibilistic approach to axiom scoring has been applied over complete and large datasets in ontology learning. This paper presents an adaptation of the possibilistic approach to axiom scoring to the context of RDF data streams for ontology evolution, a scenario which forcefully deals with incomplete and time-dependent data. Possibilistic axiom scoring is used in two distinct scenarios: (1) with previously known property axioms, allowing for the exploration of the effectiveness of the approach in a scenario in which no incorrect data was present; and (2) in an evolving knowledge scenario, in which neither the properties nor the axioms were known and the dataset was obtained from publicly available sources, possibly both incomplete and with errors. Results show the effectiveness of the approach in accepting/rejecting axioms for the ontology’s properties. The different approaches to possibility and necessity proposed in literature are recontextualized in terms of their bias towards selective confirmations or counterexamples – showing that some axioms benefit from a more lenient approach, while others present a lower risk of introducing inconsistencies by having harsher acceptance conditions.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Stephan Mennicke submitted on 28/Mar/2025
Suggestion:
Minor Revision
Review Comment:

I have read the answers to the reviewers and acknowledge the efforts of the authors in revising the paper. Also, I have read the changed parts of the paper as indicated in the authors' answer letter. While the definitional part appears much cleaner now, having a minimum of justified _real_ definitions, I also acknowledge the comments of the other reviewers.

Either I missed something important or the question of the feasibility of the possibilistic approach is not given, which may be seen as a research outcome, but also only touches the surface of the research question I interpreted from the contents of the paper. My main concern -- also here, maybe I missed something important (cf. bullet points regarding Section 3 below) -- is the handling of identical individuals in different time frames. This might be a limitation of TICO, which was well-justified in the setting TICO was made for, but because of UNA, there should not be two different individuals having the same identity. Doesn't this change the whole idea about the property axioms? I hope to just have missed the reconciliation step of identical individuals. If it is not reconciled, how do you argue for this neglection? I guess, identical individuals in different time frames do not occur in your experiments, but what happens if?

Finally, you state that inclusion/exclusion of property axioms must always adhere to the usefulness in reasoning _to allow the ontology/data to show its full potential_, but any of the experiments has not addressed this point. It is somewhat addressed in the background section, but could be mentioned, at least as a point for future work and how you plan to address this in experiments. Also, further experiments with actual real-world data are more than welcome in the future.

These points lead myself to the decision of a _weak accept_. I wish the authors to address the issues brought up here and, potentially, by the other reviewers as well.

=== Section 3

- what are _sequential RDF facts_, or sets thereof? I guess, in this context, you mean sequences and/or streams of RDF facts that you then combine into sets.
- Def. 2:
- regarding the purpose and the descriptions provided upfront, a timeframe is always finite, so no need for having the word _finite_ in parentheses; I guess, if timeframes were not finite, they could never be used by any framework as input
- although I understand timeframes as finite portions of the RDF stream, the definition also suggests that there are only finitely (i.e., $n$) many of them, contradicting the assumption that such frames are finite portions of a possibly infinite stream
- while streams may be unbounded, meaning of infinite nature, a particular individual is _bound_ to a single time frame (i.e., not bounded); I don't get how you ensure this holds in your setting. Is there a maximum number of facts per time frame or is a time frame just defined as the closure over one or more individuals; if so, in reality, there might be scenarios in which facts about a set of individuals does not arrive _in the expected order_, which lets me conclude that time frames need to be arbitrarily large objects. Is that true?
- the text after Def. 2 suggests that TICO handles the same individuals in different time frames as distinct elements; how does your framework cope with such situations? Is there a posterior alignment involved?
- steams $-->$ stems
- please, consistently use _real-time_; sometimes I see _real time_
- does not entail neither ... nor ... $-->$ does neither entail ... nor ...
- Table 1 is superfluous as the list descriptions are complete and easy to comprehend

Review #2
Anonymous submitted on 14/Apr/2025
Suggestion:
Minor Revision
Review Comment:

Thank you for the opportunity to check the revised paper. The authors have made major improvements to the paper.

I understand that a motivating example was presented in previous research [1], but I believe having a concrete example, simply mentioned in text or even better with visuals similar to what you did in [1], would greatly improve readability and anchor the readers in the context of the research. It is advisable for the paper to be self-contained, without requiring readers to check [1] for an example. Maybe the “sensor data” example referred in problem definition 1? If more details and context about the sensor scenario (type, scientists using it, how it’s changing/evolving, etc.) can make the paper stronger. Another example to add could be an evolution scenario like the Pokémon dataset that connects to experiment 2 and the other research problem you are targeting.

Review #3
By Andrea Tettamanzi submitted on 22/Apr/2025
Suggestion:
Accept
Review Comment:

With this revision of the manuscript, the author responded to all my comments in a satisfactory way and fixed all the issues that I had pointed out. I think that the current version of the manuscript is suitable for publication, after a last thorough proofreading (a few typos are still there, e.g., Acception instead of Acceptance).