Translational research combining orthologous genes and human diseases with the OGOLOD dataset

Paper Title: 
Translational research combining orthologous genes and human diseases with the OGOLOD dataset
Authors: 
José Antonio Miñarro-Giménez, Mikel Egaña Aranguren, Boris Villazón Terrazas, Jesualdo Tomás Fernández-Breis
Abstract: 
OGOLOD is a Linked Data dataset derived from different biomedical resources by an automated pipeline, using a tailored ontology as a scaffold. The key contribution of OGOLOD is that it links, in new RDF triples, genetic human diseases and orthologous genes, paving the way for a more efficient translational biomedical research on the Linked Open Data cloud.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Decision/Status: 
Accept
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Resubmission after "reject and resubmit" in two rounds. First round reviews are beneath the second round reviews, which are beneath the third round reviews.

Solicited review by anonymous reviewer:

The article has been revised and has been improved upon substantially. Apart from one minor typo and a suggested small rewriting, I have nothing else to add.

Typos:
straight forward -> straightforward
Suggestion:
datasets which publish -> datasets that publish

Solicited review by Sören Auer:

Since the last revision this article has improved quite a lot. I deem it acceptable in its current state pending few minor revisions:

* Table 2 and 4 are actually not tables - please use a listing or figure environment here
* in Section 2.6 should not be enclosed with <> - if you use brackets inspired by N3, then only for non-prefixed URIs
* formatting of the SPARQL queries could be done more consistently, e.g. indentation differs, closing brackets etc. maybe you should also use a monospace font for the queries

Second round reviews:

Solicited review by anonymous reviewer:

I did not find major improvements in this revision, but I found a couple more scenarios, which is good.

A minor typo:

related to genes involved -> related to the genes involved

and a sentence that needs to be rewritten:

Finally, the OGOLOD dataset provides information
of orthologous genes from repositories that are not currently
published using LD principles, so the OGOLOD
resources cannot be linked to such source repositories.
Consequently, if these repositories were published, the
utility of this dataset would improve.

I assume that "this dataset" refers to the OGOLOD dataset, therefore I suggest just replacing "this dataset" with "the OGOLOD" dataset.

Solicited review by Sören Auer:

This article describes the OGOLOD dataset, which aims to facilitate translational research combining orthologous genes and human diseases. The dataset is potentially interesting, but the authors fail to convince the reader of its importance. Although, the paper contains a use case section it does not really describe the potential use of the dataset. What new questions can be answered with the dataset? How many people, researchers, stakeholders are interested in these questions? Also, I miss a little more meat in the article. Currently, quite much space is devoted to acknowledgements and references. Also, the presentation of the figures could be improved: The figures could be reduced in size (esp. Fig. 2 would fit in one column). The SPARQL query could be formated for improved readability. I also miss some lessons learned: What problems did you face during the conversion, extraction, linking? How were the links generated? What is their precision/recall? I also do not completely understand the part on OWL punning: from my perspective its not two entities who share the same identifier, but its one entity, which appears in different roles (i.e. class and instance). Hence, I do not see a problem that this resource is published as a single LD resource. You sometimes use LD sometimes LOD. For numbers with many digits I would recommend to use a thousand separator.

Solicited review by Erik Wilde:

Judging by the cover letter, it seems that the authors did a good job of addressing all the issues raised by all reviewers. The most important issues were clarifications regarding how this submission compares to other publications, which is clarified now. Also, there were some technical issues with the paper content, which were addressed as well. Given those modifications, the paper will be a valuable starting point for readers, and links them to more complete descriptions of the work, should they be interested in the dataset in more detail.

First round reviews:

Solicited review by anonymous reviewer:

Below are the answers to specific points mentioned in the submission
guidelines, followed by more detailed comments.

Summary:

This paper describes a Linked Open Data dataset that combines information about orthologous genes and human genetic diseases. It is therefore quite "specialized." Here are some questions I have:

A) Is this paper totally contained in [6] (or {5], [6]. [7])? If not, what are the new parts or insights?

B) Because it is rather "specialized" one would expect at least a reference to a technical paper in biomedical research that motivates this particular dataset and gives examples of queries that can be answered using OGOLOD. If one such paper is already referenced, this connection should be clearer.

C) Looking at Table 1, the dataset is actually quite large, a fact that is not sufficiently highlighted in the text proper. What difficulties in terms of modeling and scale were encountered, if any?

D) It appears that queries rely heavily on . Are such links established by the authors or by someone else (this should be explained) as well as discussed. Can it be assumed that the two connected concepts are in fact the "same"?

1) Name, URL, version date and number, licensing, availability, etc.

This is given.

2) Topic coverage, source for the data, purpose and method of creation
and maintenance, reported usage etc.

Although it is described the objective of the dataset, no reported usage by practitioners is given.

3) Metrics and statistics on external and internal connectivity, use of
established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language
expressivity, growth.

This could be expanded upon.

4) Examples and critical discussion of typical knowledge modeling
patterns used. Known shortcomings of the dataset.

This is not well discussed.

5) Quality of the dataset

Refer to D above and questions on .

6) Usefulness (or potential usefulness) of the dataset

References to biomedical literature should be more clear.

7) Clarity and completeness of the descriptions

Details are scarce.

----------------------------------------------------
Typos

-----------------------------------------------------------------------------
instances6, -> instances,6 (footnote should follow the punctuation mark here and elsewhere)
i.e. -> i.e.,
e.g. -> e.g.,
was solved for OGOLOD -> was solved in OGOLOD
The table 2 -> Table 2 (capitalize Table, Section, Figure, etc., when referencing them).

Solicited review by Sören Auer:

This paper describes the OGOLOD dataset, which contains information relevant for translational medicine. The short description is well written and illustrated, but its a little difficult to understand and appreciate the value of it when not being a bio-informatician. The authors should consider illustrating the use and the benefits better.

Information about the availability is not complete, while SPARQL endpoint and individual resources seem to be available, no information is given about availability of a bulk data dump for download. Also, the authors should note, that the license CC BY-NC 2.0 is not compatible with the Open Definition (not allowing restrictions on the commercial use), thus strictly speaking rendering the dataset not being Open Data.

For self-containment, Section "2.3. Refactoring OWL Punning" should be extended instead of just referencing the descriptions being employed.

Solicited review by Erik Wilde:

This seems to be intended as an extended abstract of a paper published elsewhere, which is a little bit odd, but may be a good way to make people aware of the dataset. Th writing style is clear and makes it easy to understand what the dataset is about, but there are only very few details in the paper. One minor correction that should be made is about terminology: a "URI scheme" is the part of a URI before the colon, such as "http" or "https". this is not what the authors are talking about when they mention URI schemes, so they should choose a term that is not well-defined and taken in the area of URIs.

Tags: 

Comments