A Neuro-Symbolic System over Knowledge Graphs for Link Prediction

Tracking #: 3203-4417

Authors: 
Ariam Rivas
Diego Collarana
Maria Torrente
Maria-Esther Vidal

Responsible editor: 
Guest Editors NeSy 2022

Submission type: 
Full Paper
Abstract: 
Neuro-Symbolic focuses on integrating symbolic and sub-symbolic systems. The aim is to provide a neural-symbolic implementation of logic, a logical characterization of a neural system, or a hybrid learning system that contributes features of symbolic and sub-symbolic systems. They differ fundamentally in how they represent data and information. Neuro-symbolic systems have recently received significant attention in the scientific communities. However, despite efforts in neural-symbolic integration, symbol processing currently has limited scope and applicability. This work leverages the symbolic system, independent of the application domain, and improves the predictive capability of Knowledge Graph Embeddings (KGE). We tackle the problem of Neuro-Symbolic AI integration, enabling expressive reasoning and robust learning to discover relationships over a knowledge graph. We present a novel approach to integrating Neuro-Symbolic AI systems. Deductive databases implement the symbolic system for an abstract target prediction over a knowledge graph. The symbolic system enhances the predictive capacity of the subsymbolic systems implemented by KGE models. Our approach builds the ego networks of the head and tail of the abstract target prediction, and the symbolic system deduces new relationships enhancing the ego networks. Thus, the subsymbolic systems increase the predictive capacity of the abstract target prediction. As a proof of concept, we have implemented our neuro-symbolic system on top of a KG for lung cancer to predict treatment effectiveness. Our empirical results put the deduction power of deductive databases into perspective; they suggest that enhancing the neighborhoods of the entities on the head or tail of a target prediction can improve the predictive capacity of existing KGE models.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/Oct/2022
Suggestion:
Major Revision
Review Comment:

This paper is about link prediction using a neuro-symbolic system and an application in the biomedical domain, for prediction if the application of some treatments in healthcare has valuable effects.
The subject is interesting and the application is important.
However, the first main remark is that this paper is not well written and well organized. There are many things which are introduced without any explanation or justification. The paper lacks a strong research line and is confusing in many points.
Many elements are defined but the reader does not actually understand what is the utility of such elements and why they are introduced, such as ego networks, abstract target prediction, neighborhoods induced by ego systems.
Definitions are proposed in page 4 but are they really useful?
These notions seem to be totally ad hoc and do not bring any value in the paper (while authors underline that their approach is "agnostic").
Moreover, it is very hard to understand the objectives and the general contributions of the authors about link prediction.
This is mainly due to the fact that many things in the paper seem to be related to the pharmacology domain.
Then the paper cannot be accepted in its present form and should be totally revised.

Details.

Introduction.

The introduction is written in a very loose style especially at the beginning.
For example "sub-symbolic systems are AI systems", do you think that AI is only related to systems working on numerical data?
In this case logic, KR, reasoning... are no more AI methods?

You write either "neural-symbolic" or "neuro-symbolic", is it the same thing?
In any case, if this is the same thing use only one expression.
In addition, do you consider that neural networks are the only subsymbolic systems?

A definition or at least a more precise explanation about "neuro-symbolic integration" would be welcome.

What do you mean by "ego systems"?

What do you mean by DDI?

In the list of contributions authors should add at least one sentence explaining what they mean. Each sentence is very vague and can have several possible interpretations.

In page 3, authors mention pharmacodynamics and pharmacokinetics. Then the first is totally ignored. Then why mentioning it?

What do you mean by
"Notice that the phramocokinetic interactions can be encoded in a symbolic system."
This should not be so simple and deserves more explanations.

What do you mean by "true relations"?

Are DDIs transitive relations? Explanations are needed.

The sentence line 45 page 3 is hard to understand.

What do you mean by "a subsymbolic system, e.g. implemented using a KGE model..."?
In addition there is a problem in this sentence.

The "definitions" in page 4 are not very well written.
What are "properties"?
What is E?
What is a "unified ontology"?
What is an "ego entity"?

Why do you need "Ego Networks" and "Neighborhoods" that do not seem to be very useful in the rest of the paper?

The introduction of "abstract target prediction" seems to be very ad hoc.
Why do you need this construction?

Same thing with the definition of projection in a KG based on an ATP.
This "definition" is not understandable (actually this is not a definition).

How IDB predicates allow to deduce new relationships and "enhance" the ego network?
The term "enhance" is used several times without any definition and it is hard to understand what is its meaning.

What do you mean by "true triples"?

What is the hyperplane introduced in page 5 line 50?

Page 6, again: we aim to "enhance the predictive capacity"... what do you mean? How do you measure this "predictive capacity"?

What do you mean by "ideal KG" and then by "complete KG"?
Again these expressions are not defined and should be.
Moreover concrete examples should be provided.

What do you mean by "each possible combination of entities in V"?

Figure 3 in page 7 cannot define the whole process.
More explanations are needed and a working example would also be welcome.

In page 7, what do you mean by "minimal model"?
And then why can it be computed in polynomial time?

In section 4, it is quite hard to understand why authors are using this very heavy vocabulary and why we have so many things involved in link prediction.
It seems that such a system cannot be easy to build and to maintain and that many things are totally ad hoc (even if authors say that the system is agnostic, they do not explain why).
Moreover, one can have the impression that a simple reasoning procedure could lead the prediction task that is presented here, using deduction.
But the paper is so badly presented and in a so heavy way that it is very hard to see anything and to understand where we are exactly.

In page 15, discussion.
Authors introduce T_KGbasic and T_KGrandom without any definition.

In section 5, the results are not very well explained.
Figures 8 to 11 in particular should be better explained and authors should explicitly write how we should "understand" these figures.

In section 6, authors discuss mainly neuro-symbolic techniques and application in the biomedical field.
In knowledge discovery, there is a whole trend of research work on link prediction using subsymbolic methods. It would be interesting that authors discuss the originality of their work w.r.t. to this trend of research, and present also their positioning w.r.t. to such a research work.

Review #2
By Abhilekha Dalal submitted on 25/Oct/2022
Suggestion:
Accept
Review Comment:

Summary:
The paper talks about Neuro-symbolic AI integration, facilitating expressive reasoning and concentrated, rich learning to deduce relationships over knowledge graphs. The paper presents an approach to implement the symbolic system for an abstract target prediction over a knowledge graph, and the symbolic system deduces new relationships enhancing the predictive capacity of the subsymbolic systems implemented by KGE models. The results of implementing the system over KG lung cancer suggest that enhancing the neighborhoods of the entities on the head or tail of a target prediction can improve the predictive ability of existing KGE models.

Contribution:
The work is an extension of their previous work; based on those results; they present an approach to combine symbolic reasoning implemented by deductive Datalog databases with subsymbolic systems implemented as KGE embeddings to reduce the problem of KG sparsity and enhance prediction accuracy.
The approach followed is domain-agnostic; implementation was done to predict the effectiveness of lung cancer treatments composed of multiple drugs (i.e., polypharmacy treatments). The solution captures knowledge in KG by building ego networks of entities corresponding to the head and tail of the abstract target prediction to deduce DDIs within a treatment. The KG is completed with the implicitly defined relations by embedding all the knowledge in the graph, which enhances the ego networks of abstract target prediction, and effectively predicts the effectiveness of treatment.
The assessment of the proposed approach was done to predict the effectiveness of lung cancer treatments composed of multiple drugs (i.e., polypharmacy treatments). An extensive evaluation was carried out by implementing the solution on eleven state-of-art KGE models, and results from a 5-fold cross-validation process show that there is improvement in the prediction accuracy of eleven state-of-the-art KGE models.

Discussion
Weak aspects:
They often talk about "symbol processing" in the paper; providing a line describing the term will be helpful.
The link for evaluation metrics provided in the paper(https://github.com/SDM-TIB/Statistics_KnowledgeGraph) does not work.

Strengths:
Theoretical foundations are well described, providing a motivating example and a use-case that implements the proposed approach strengthens the paper.
Though the work is not groundbreaking - link prediction through deducing new relationships in KG, but is relevant.
The work done is independent of the application domain; implementation is shown and done on top of KG for lung cancer to predict treatment effectiveness.
The solution was evaluated on different models using different metrics. The results obtained are promising and well-presented.
The paper is well-organized, with an excellent readability score.
The source code, dataset used for the experiments are fully implementable and maintained and can be accessed through the public domain - GitHub through the link given in the paper(https://github.com/arivasm/Neuro-Symbolic_Treatment-Response).