Prediction of Adverse Biological Effects of Chemicals Using Knowledge Graph Embeddings

Tracking #: 2804-4018

Erik Bryhn Myklebust
Ernesto Jimenez-Ruiz
Jiaoyan Chen
Raoul Wolf
Knut Erik Tollefsen

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Full Paper
Semantic Web technologies enable the interoperability of disparate data sources. We have created a knowledge graph based on major data sources used in ecotoxicological risk assessment. This facilitates the use of the extensive library of Semantic Web tools. We have applied this knowledge graph to an important task in risk assessment, namely chemical effect prediction. We have evaluated nine knowledge graph embedding models from a selection of geometric, decomposition, and convolutional models on this prediction task. We show that using knowledge graph embeddings can increase the accuracy of effect prediction with neural networks. Furthermore, we have implemented a fine-tuning architecture which adapts the knowledge graph embeddings to the effect prediction task and leads to a better performance. Finally, we evaluate certain characteristics of the knowledge graph embedding models to shed light on the individual model performance.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Adrien Coulet submitted on 06/Jul/2021
Review Comment:

Authors improved the article significantly. Many adds facilitate understanding both objectives (in sections 1 and 3, but also 9) and results (in section 5 and 7).
This reinforce my recommendation to accept the article that I still consider of quality and interest.

Review #2
By Arif Yilmaz submitted on 26/Aug/2021
Review Comment:

Prediction of Adverse Biological Effects of
Chemicals Using Knowledge Graph Embeddings

(1) originality:
This manuscript provides details of the improvements built upon authors’ previous publications on a knowledge graph in Ecotoxicology domain.

The paper starts with explanation of ecotoxicology definition, and its importance. Challenges of datasets related to the field are listed as interoperability from various data sources.
Knowledge graphs(rdf) and semantic web technologies are suggested as a solution of orchestration of these datasets.
For the sake of completeness the manuscript provides details. On the other hand, this makes following the paper difficult, especially in the methods part. The general quality of the manuscript is appropriate.
Main contribution of the work is investigation of KG embedding methods and adding new datasets to previously published KG. The overall quality is acceptable.

(2) significance of the results,
The manuscript provides appropriate details on KG embeddings on proposed KG. The details of the results are enough.

(3) quality of writing:

The manuscript is acceptable in terms of writing quality.

Contributions of the work:
1. Consolidation of relevant information to ecotoxicology domain as knowledge graph. Integration includes tabular data, ref files, sparql queries over public linked datasets such as Wikidata and log map.
Biological :Ecotox, 1M experiments, 12K chemicals, and 13kK species.
Chemical : Ecotox,Wikidata pubchem, chembl mesh,
Taxonomy : Ecotox, NCBI
Species Traits Enc. of Life,

2. Implemented a prediction model using MLP (multi)and KG embedding models are presented.

3. Manuscript investigates prediction performance of various embeddings namely
Decomposition Models : dismay, complEx, Hole
Geometic Models: TransE, RotatE, pRotatE, HAKE
Convolutional Models: Cons KB, ConvE.

Review #3
Anonymous submitted on 25/Oct/2021
Review Comment:

In this new version of the paper, most of the comments have been addressed. Now the approach followed to build the TERA knowledge graph and implement the link prediction tasks are more clearly described and better supported.

Despite the improvements, paper writing needs to be carefully reviewed to enhance readability. In particular, there are long sentences that should be shortened. For example, lines 38-44 page 21; lines 31-35 page 24; line 32-38 page 29.

Given the amount of notation used along the paper sections, a table summarizing the most relevant symbols would considerably facilitate legibility.