FIDES: An Ontology-based approach for making Machine Learning systems Accountable

Tracking #: 2962-4176

Authors: 
Iker Esnaola-Gonzalez
Jesús Bermúdez

Responsible editor: 
Guest Editors Ontologies in XAI

Submission type: 
Full Paper
Abstract: 
Although the maturity of the artificial intelligence technologies is rather advanced nowadays, its adoption, deployment and application is not as wide as it could be expected. This could be attributed to many barriers, where the lack of trust of users stands out. Accountability is a relevant factor to advance in this trustworthiness aspect, as it enables discovering the causes that derived a given decision or suggestion made by an artificial intelligence system. In this article, the use of ontologies is conceived as a way for making machine learning systems accountable, thanks to their conceptual modelling capabilities to describe a domain of interest, as well as formality and reasoning capabilities. The feasibility of the proposed approach has been demonstrated in a real-world energy efficiency scenario and it is expected to pave the way towards raising awareness of the possibilities of semantic technologies in different factors that may be key in the trustworthiness of artificial intelligence-based systems.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 30/Dec/2021
Suggestion:
Major Revision
Review Comment:

Summary:
In this work, the author(s) present a tool named FIDES which is designed to help keep ML systems accountable. FIDES uses an ontology to annotate the procedures used to develop a predictive ML system and the generated forecasts by the system. The retrieved information/annotations are stored in an RDF store which is available for users to query using a dedicated GUI.

Review comments:
- The difference between accountability and explainability is not clearly defined. It would be better to also show how explainability could also be used in the process of making ML models accountable.

- On page 2 line 26-30 (left), an example should be given to support that claim.

- On page 3 line 23-25(left), mention some of the major limitations discussed in the cited paper.

- Discuss the adaptability of your tool for models developed using another language than R.

- How to extend the tool to include different kinds of ML models? Models which use different training procedures, Models, using different evaluation techniques, …

- The SPARQL queries seem very limited … they don't cover all kinds of questions that users may ask.

- What are the reasons for selecting (only) those CQs for defining the information requirements?

- And why (only) RMSE?

- For the evaluation, only two data scientists and one system manager participated. No justification given why 3 people are enough to evaluate the tool.

- Minor: there are some typos.

- In general, the CQs should be revised and more participants should be involved in the evaluation process.

Review #2
By Dagmar Gromann submitted on 31/Jan/2022
Suggestion:
Major Revision
Review Comment:

SUMMARY:
This article proposes an ontology-based approach to account for training, implementation and performance details of statistical machine learning models in order to increase their accountability. It re-uses pre-existing ontologies to describe ML models and to describe obtained predictions, which are evaluated in a use case on energy efficiency.

OVERALL EVALUATION:
The idea to provide more detailed metadata on machine learning models/procedures is highly interesting. However, the proposed approach seems to consider a very limited range of models/procedures, mostly applicable to statistical ML models, especially when requiring an implementation in R. One basic ontology reused for accounting on models' predictions is derived from the domain of energy efficiency, but it is claimed that it is still generally applicable and easy to re-use. However, the chosen use case is then in the domain of energy efficiency, which fails to support this claim. Furthermore, the three people chosen for the evaluation are themselves developers/system managers, which says little about instilling trust in AI systems by general users. It is stated that the system had been evaluated with 120 units, however, no details of this evaluation are provided. In a nutshell, with a different use case and a rigorous evaluation procedure with end users this might be an interesting approach, given that the tool is described in more details and made publicly available.

SPECIFIC COMMENTS:
Introduction:
Accountability (page 2, par 2) is first defined based on a source that addresses explainability and in fact the definition fits the latter better than the first. Please delimit these two concepts more clearly and explicitly. Please provide first a definition of explainability and then explicitly a definition of accountability including distinctions between these two. As it is now, this distinction is not clearly marked. A stronger motivation on your proposed work, including explicit contributions, would probably increase the readability of this article.

Related Work:
Also the related work section suffers from a lack of clearly delimiting the concepts explainability, accountability, and trustworthiness. In this paper, it seems that you address accountability only, thus raising the question on why all these explainability approaches can be considered related work.

FIDES:
FIDES seems to be limited to models developed in R, however, the vast majority of available neural models (and traditional ML algorithms) is developed in Python or Java, Ruby, etc. Could you please explain this choice further and justify it?

Ontology:
The question "Which is the frequency of a given predictive model’s training data?" is very unclear - training data have no frequency per se. Do you mean frequency of individual items in the dataset? Also "Which is amount of observations used for training a given predictive model?" is not very clear - do you mean the size of the training dataset? The question "When was the last data point within a given predictive model’s training data collected?" presumes that training data are dynamically collected, which is not always the case. I propose splitting the questions further regarding the actual training procedure and the datsets involved in training a model and then structuring the compentency questions accordingly with different options for different types of training procedures/dataset collection procedures. For the question "Which is the base algorithm of the predictive model?" it would in many cases be unclear to me what to answer here, if it is not a traditional statistical machine learning procedure, but a deep learning architecture. And what if RMSE had not been used as a loss function?

Overall, it should be stated who answers these questions and when/for what reason. Is it developers who seek to utilize FIDES who answer these questions? Or would they be semi-automatically derived from the model, e.g. hyperparameter settings, datasets, loss function, etc. are generally available as explicit information.

While the EEPSA ontology might be a valid choice, even though the authors may be biased here, some of the elements, especially "Quality", should be specified more clearly (also the current online definitions do not help much here). What is the gap between the OntoDM core and the DMOP ontology to be addressed by the ML-Schema? How about improving the ML-issues addressed for FIDES instead of asking the ML community to do so?

Especially with the mapping of the two ontologies, it is rather unclear to me where all the metadata highlighted as important in the introduction and competency questions are represented or can be modeled for a specific implemented model, e.g. authorship, responsibilities, etc. The mapping also raises the question of what exactly can be gained by mapping these two ontologies. While the examples in the next section provide some ideas on this, it should be explicitly stated in this section.

The actual implementation of FIDES as a tool/system should be described in more details beyond the statement that there is a GUI and the specific pre-existing services are re-used for it. Is it publicly available?

FIDES in use:
To account for the actual generalizability of utilizing an "Energy Efficiency Prediction Semantic Assistant" ontology for general ML accountability purposes, it would be necessary to chose a use case that is not in the energy efficiency domain.

Why is the overall energy efficiency solution out of scope for this paper? How does FIDES account for privacy issues? While in some countries it might be legal to access such private information as energy consumption of your neighbors, in others it is not. How are privacy concerns of data addressed in this approach?

Evaluation:
The queries to the datasets are insightful and interesting, however, for a true evaluation in how far this approach increases trust of users, it would be interesting to provide an evaluation with the users in this use case and their view in how far such a system helps to instill trust in AI. Having two of the data scientists involved in training the systems and a system manager involved in the evaluation fails to address this critical point. The selection of participants of course also strongly affects the validity of the evaluation. A separation of evaluation and discussion would allow for a proper discussion of the proposed approach, which is currently intertwined with an evaluation that could benefit from improvements.

MINOR COMMENTS:
Please check SWJ guidelines on how to format your manuscript, e.g. how to refer to figures
Also a consistent and correct use of quotation marks would improve the manuscript.
p1.35 can be overcame => overcome
p1.43 including the explainability => including explainability OR the explainability of AI systems.
p1.Footnote => it extends into the text of the second column, please fix this
p2.3 The explainability => Explainability
p2.11 The accountability => Accountability
p2.34 as it would be needed to be an expert => as the person would have to be an expert
p2.36 the regular performance these accountancy tasks would be infeasible. => ??
p2.47 Since the AI is a field => Since AI is a field
p2.50 the machine learning (ML) => machine learning (ML)
=> please check the use of articles in the entire manuscript, too many to mention all here
p2.38 generation of ... into their AI-enabled systems => for?
p3.19 may act an effective way => may act as?
p3.35 To the extent of knowledge of author => the authors and shouldn't you check on what happened since 2020 in this regard?
p4.33 knowable topic => ??
p4 ff I recommend introducing acronyms, e.g. RMSE, SOSA, EEPSA, etc.
p5.32 may derive in => may result in
p8.42 Figure 5 => Fig. 5. (full-stop missing)
p12.21 would require from further functionalities => ??
p13.37 clicking in the ’Algorithm’ button => on

The overall language quality of the manuscript requires thorough revision.

Review #3
By Ernesto Jimenez-Ruiz submitted on 25/Apr/2022
Suggestion:
Reject
Review Comment:

The paper presents a framework called FIDES to make ML systems accountable. The presented topic is very interesting and more works in this line are welcome, however the paper in the current state has several limitations:

- FIDES is in principle model agnostic, but it seems to be built on top of R. Could FIDES be easily extended for other languages? This looks like a limitation, although if FIDES is generic, it should not be a problem.

- I miss additional information about how FIDES works as a generic framework and how an external party could make use of it. How can a model create semantic data suitable to FIDES?
In Section 3.1 this is slightly mentioned and it seems that it can be automated. Is this related to the dependency on R and Rserve.

- The core of the methods section of the paper focuses on the selection of the ontology, however the selected ontologies have been previously published and thus, the novelty of the paper is affected, especially given the fact that FIDES has not been described in detail.

- Section 4 brings some light about FIDES, but, from the description in the paper, it is still unclear how a third party beyond the described use case could make use of the framework.

- Organising metadata (like hyperparameters and values of prediction for the different prediction models) as an RDF graph is promising (and very important as first steps for accountability) as this enables data access via SPARQL queries (so far 4 predefined queries); but I believe this is not enough contribution for a journal paper.

- I think if FIDES is transformed into a generic framework that allows/helps/drives the semantic annotation of prediction models it could become a promising system paper contribution for the journal.

Additional:
- First footnote is out of margin.
- Instead of starting a sentence with "[x] provides" it is better to start with "Author et al. [X] provides..."
- Figure 5 is referenced before Figure 4
- Systems like OptiqueVQS (https://sws.ifi.uio.no/project/optique-vqs/) can help to visually formulate queries driven by an ontology.