Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Sebastián Ferrada
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Krzysztof Janowicz
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
Sanju Tiwari
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Publishing DisGeNET as Nanopublications

Submitted by Laura I. Furlong on 10/15/2014 - 07:37

Tracking #: 879-2089

A new version of this paper is available

Authors:

Núria Queralt-Rosinach

Tobias Kuhn

Christine Chichester

Michel Dumontier

Ferran Sanz

Laura I. Furlong

Responsible editor:

Boyan Brodaric

Submission type:

Dataset Description

Abstract:

The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for discovery in Life Sciences. The scientific community cannot process assertions from biomedical publications and integrate them into the current knowledge at the same rate. The automatic extraction of assertions about entities and their relationships by text-mining the scientific literature is an extended approach to structure up-to-date knowledge. For knowledge integration, the publication of assertions in the Semantic Web is gaining adoption, but it opens new challenges regarding the tracking of the provenance, and how to ensure versioned data linking. Nanopublications are a new way of publishing structured data that consists of an assertion along with its provenance. Trusty URIs is a novel approach to make resources in the Web immutable, and to ensure the unambiguity of the data linking in the (semantic) Web. We present the publication of DisGeNET nanopublications as a new Linked Dataset implemented in combination of the Trusty URIs approach. DisGeNET is a database of human gene-disease associations from expert-curated databases and text-mining the scientific literature. With a series of illustrative queries we demonstrate its utility.

Full PDF Version:

swj879.pdf

Revised Version:

Publishing DisGeNET as Nanopublications

Tags:

Reviewed

Decision/Status:

Minor revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Amrapali Zaveri submitted on 11/Nov/2014

Suggestion:
Minor Revision

Review Comment:

The article “Publishing DisGeNET as Nanopublications” describes the publication of DisGeNET nanopublications as a new Linked Dataset implemented in combination of the Trusty URIs approach. DisGeNET contains information on human gene-disease associations from expert-curated databases and from text-mining the scientific literature.

The dataset has been published by re-using a rich set of vocabularies and also interlinked with several other datasets, thus complying to the Linked Data principles. Also, other relevant information about the dataset has been satisfactorily described. Thus, I recommend to accept the paper. However, I just have a few queries/suggestions:
- Add a bit more information on the advantages of having a dataset in the form of nanopublications.
- According to Table 1, only 4% of the assertions are curated, which makes it rather a low quality dataset. How do you plan to increase this? How accurate is this curation?
- Do you assess the accuracy/quality of the predicted and literature extracted data?
- What is the GDA concept?
- How did you perform the interlinking? Did you assess the quality of the interlinks - accuracy and completeness?
- Figure 1 is illegible, please increase font.
- There is hardly any related work discussed.
- Did you come across any challenges during the conversion to nanopublications?
- How do you plan to update and maintain the dataset?
- There is not much evidence of third-party usage of this dataset.

The paper is well-written, however I encountered some formal errors:
1. Introduction
- IBI - provide full-form
- 7 - seven
2.1.1. GDA Content
- CUI - provide full-form
- de-referenceable - dereferenceable (also in 3.1)
- DisGeNET Nanopublication Dataset
- 4 - four
3.1 Ontologies
- Even though, the modeling - Even though the modeling
3.2 Schema
- e.g. examples - repitition
3.3. Metrics, Versioning, Licensing
- TriG syntax - provide reference
4.3. Linking with other LOD Resources
- Since in DisGeNET RDF is also represented the relation between gene and the protein/s that encodes, - please rephrase
6. Applications
- Open PHACTS Discovery platform - provide reference
- As a side note, I think section 3.3 and section 5 could be merged or put under one section.
- Also, I would prefer that you add the link to http://www.disgenet.org/web/DisGeNET/v2.1/rdf directly rather than link to the paper adn have the reader look up the link there.

Review #2

By Eleni Mina submitted on 05/Dec/2014

Suggestion:
Minor Revision

Review Comment:

The paper Publishing DisGeNET as Nanopublications, presents the release of a very valuable source, gene disease associations, using the nanopublication model and trusty URIs. This paper is also indicative of the natural evolution in publishing information in science. Adopting semantic web standards for publishing information together with provenance metadata and cryptographic hash values in the URIs. The paper is well written and structured and well motivated. The potential of such an effort is apparent and I find it an excellent effort towards knowledge discovery. I definitely accept this paper, but I do have some minor comments that need to be addressed by the authors.

Minor revision comments
1. Virtuoso can be configured differently, more properly in order to provide a message for the errors that result from the sparql query.

2. It is not very clear to me what exactly is already in nanopublication format and what is not. For example substituting the disease id of the 4.1 section with the huntington's disease id, C0020179, does not give any hits back. Like what percentage of the RDF data source has been already transformed into nanopublications?

3. I would personally find it very helpful to include a picture of the nanopublication schema. This can help a lot the reader to understand the model and it also saves a lot of time and effort when you want to perform sparql queries over this dataset.

4. It is not very clear to me (and maybe this also relates to the previous comment about the figure), how to retrieve with the current model, assertions that are talking about the same GDA (e.g. geneX is associated with diseaseY) but have different types of evidence e.g. literature and predicted.

Log in or register to post comments
21436 reads

Comments

Link to access the nanopub query examples

Permalink Submitted by Núria Queralt-R... on 10/24/2014 - 08:52.

Apologizes,

The link to see the nanopub queries in the paper is DisGeNET nanopub queries.

Best,
Núria

Hyperlink to the paper query

Permalink Submitted by Núria Queralt-R... on 10/27/2014 - 11:05.

Dear all,

Apologizes, in the submitted manuscript the hyperlink to the full nanopublication query example is missed. Please, follow this following link in order to access it:

http://www.disgenet.org/web/DisGeNET/v2.1/rdf#nanoQueries

Kind regards,
Núria

Main menu

Editorial Board

Syndicate

Publishing DisGeNET as Nanopublications

Tracking #: 879-2089

Comments

Link to access the nanopub query examples

Hyperlink to the paper query

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Publishing DisGeNET as Nanopublications

Tracking #: 879-2089

Comments

Link to access the nanopub query examples

Hyperlink to the paper query

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles