Ontobee: A Linked Data Server that publishes RDF and HTML data simultaneously

Paper Title: 
Ontobee: A Linked Data Server that publishes RDF and HTML data simultaneously
Authors: 
Zuoshuang Xiang, Chris Mungall, Alan Ruttenberg, Yongqun Hea
Abstract: 
The Semantic Web allows machines to understand the meaning of information on the World Wide Web. The Linking Open Data (LOD) community aims to publish various open datasets as RDF on the Web. To support Semantic Web and LOD, one basic requirement is to identify individual ontology terms as HTTP URIs and deference the URIs as RDF files through the Web. However, RDF files are not as good as HTML for web visualization. We propose a novel “RDF/HTML 2in1” model that aims to report the RDF and HTML results in one integrated system. Based on this design, we developed Ontobee (http://www.ontobee.org/), a web server aimed to dereference ontology term URIs with RDF file source code output and HTML visualization on the Web and to support ontology term browsing and querying. Using SPARQL query of RDF triple stores, Ontobee first provides a RDF/XML source code for a particular HTTP URI referring an ontology term. The RDF source code provides a link using XSLT technology to a HTML code that generates HTML visualization. This design will allow a web browser user to read the HTML display and simultaneously allow a web application to access the RDF document. The HTML display supports dereferencing of the ontology term information in user-friendly HTML format. The RDF output supports remote query of the ontology term and the Semantic Web. The contents of the HTML and RDF output files can be different. Ontobee provides a user-friendly web interface for displaying and querying the details and its hierarchy of a specific ontology term. Ontobee currently support 101 ontologies with over 1,300,000 ontology terms. It has become the default linked data server for publishing and browsing biomedical ontologies in the OBO foundry library. In summary, Ontobee provides an efficient and publicly available method to promote ontology term URI dereferencing, web visualization and query, and further facilitate the Semantic Web and LOD.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Decision/Status: 
Reject and Resubmit
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/special-issue-linked-data-healt...

Review 1 by Richard Boyce

SWJ #2227-12-202

Narrative:

The authors describe OntoBee - the default linked data server for
publishing and browsing biomedical ontologies in the OBO foundry
library. Emphasis is placed on what they report to be a novel software
design pattern and architecture for providing RDF and HTML results in
a single system. The pattern and OntoBee architecture is described and
some results are presented, including some very rudimentary
characterization of query performance.

OntoBee is emerging as a very important resource within the community
because it is the default linked data server of the OBO foundry and
because of the scalable and open processes used by the developers. In
spite of this fact, the current manuscript should not be accepted
until all claims as to the novelty of the so-called "RDF/HTML 2in1"
architecture are either clarified in much more detail relative to
other existing approaches that have similar results, or simply removed
from the paper. RDF data servers have existed for years that provide
both HTML and RDF views depending on the client's request. For
example, D2R Server (, and
),
apart from providing a method for presenting RDF views of relational
databases (which is obviously out of scope for the OntoBee), provides
views of the data in both HTML and RDF using content negotiation. This
content negotiation approach is described in detail within the in 2008
W3C Working Group Note (http://www.w3.org/TR/swbp-vocab-pub/). What
might be new with OntoBee is the use of an XSLT link within an XML/RDF
document to push resolution of the HTML view to the client. But whether
this is novel or not is suspect without contrasting the OntoBee method
with that of the other architectures. This has to be thouroughly
addressed for the paper to have credibility within the semantic web
readership community.

The manuscript needs to be re-organized to follow typical scientific
writing conventions and avoid duplication of information. For
example, page 5, second column, second paragraph reads:

"An RDF file can be parsed down to a list of triples. A triple
consists of a subject, a predicate, and an object. The subject
identifies what object the triple is describing. The predicate defines
the piece of data in the object to be given a value. The object is the
actual value. OWL extends RDF with additional predicates to
characterize"

-- why is the reader getting introduction to RDF here in the Results
section? This is confusing. Please present *results* in the results
section, not basic concepts. Also, page 7, first paragraph of Sec 3.6
- why repeat the information about jQuery that was already stated in
the methods section. This reviewer found it annoying that more
information on jQuery was provided here in what it supposed to be the
Results section than in the Methods section. Finally, the performance
evaluation appears as ancillary results that are not foreshadowed in
the intro or methods section. These results certainly should not be
mentioned in the Results section without discussing the methods.

With respect to the performance evaluation, it is very weak compared
to standard approaches to benchmarking and, in this reviewer's
opinion, is not publishable. There is no information on how the
"random" URIs (shouldn't this be IRIs?) were chosen, how
representative these IRIs are of the population of IRIs (or the
ontologies from which they come from), or the server load at the time
of benchmarking (e.g., the busiest or least busy time of day for the
server).

Other critiques:

- "these ontology browsers do not provide the output of ontology
annotation in the RDF/XML format, an important feature for various web
applications and LOD development." - this statement needs to be
written with more technical accuracy - while the HTTP-based ontology
browser for the NCBO may not return results in RDF/XML, the NCBO has
provided a terminology service for some time that can do so (see "RDF
Term Service" at
). Also,
the authors will need to update the paper to discuss the new NCBO
SPARQL endpoint and browsing methods.

- "Many LOD browsers are available, for example," - the authors
presentation LOD browsers within the discussion of ontology browsers
is a bit confusing because they do not make clear the distinction. A
simple conceptual model might help make this more clear to the
readers. Much, if not most, of LOD is currently instance data, and a
fairly large proportion of the resources within the LOD map to other
resources that are also instances. Browsing LOD is an activity that
spans browsing instance properties across distributed resources, and
browsing ontologies when LOD resources are derived from
them. Clarifying this would be helpful for biomedical researchers
interested in producing linked data.

- The paper would be much improved by a more careful use of common
terminology from the ontology and SW research domains. The paper is
a bit sloppy in its use of jargon. One example of this issue is that
OntoBee uses IRIs but URIs are mentioned all the way through the
paper. Other instances where this could be improved are suggested
below (see "Minor points")

- Please move all URLs within text to footnotes. This is both easier
to read, and will help fix the spacing issues now so painfully
throughout the document.

- The figures need to be larger, or show less information using a
larger font. This is more true for Fig 4 than Fig 3, but both
contain content that is tough to read. Also, Fig 5 is not annotated
with a,b, and c sub-figures but these are referenced in the text.

Minor points
------------

- p1, paragraph 1: This sentence needs to be written with better
grammar : Bio-medical ontologies play a critical role in the process
of achieving the Semantic Web and the Linking Open Data (LOD).

- p1, paragraph 1: "Most ontology URIs do not point to real web
pages." - do the authors mean that the URIs are not
"dereferencable"? also, "In both cases these pages do not
efficiently support the Semantic Web." ... Please explain why!

- p2, paragraph 1: "All of the sources on these LOD diagrams are open
data." - what are the "diagrams"? not clear from the text

- p2, paragraph 1: "The basic element of LOD data exchange is the
definitions of in-dividual terms and logical relations among these
terms." - shouldn't this say "resources" instead of "terms"? Terms or
entities are from ontologies but RDF, by definition, is a *resource*
description framework - this should be more clear.

- p2: "Many LOD browsers are available, for" should probably start a new
paragraph with the text in the paragraph that follows.

- p2: "The nature of a Linked Data Server requires that such a server
pro-cesses RDF efficient." --> efficiently

- p3, paragraph 2: "The `transitive` and the `CBD` (i.e., Concise
Bounded Description) option in the Virtuoso SPARQL engine were used
to minimize the number of SPARQL queries." - this is a repeat of most of a
previous sentence in the same paragraph

- This reviewer thinks that it is poor practice to cite Wikipedia when
there are excellent, and less dynamic, resources. For example, why
not cite a W3C resource when discussing XSLT
?

- Page 4, second column, paragraph 1: "This leads to the ignorance of
all imports when the base ontology is queried from the triple
store." - Please re-write to be more clear. Also, this paragraph
shifts from present to past tense confusing the reader as what is
and what was.

- Page 5, first paragraph - strange to define URI so late in the
paper; just say URI and define URI and IRI earlier, then use IRI
throughout the rest of the paper. Anyone using Virtuoso or Top Braid
is going to have to learn the distiction sooner or later so why not
help them early on in the manuscript!

- Page 7 column 2, paragraph 2: "Historically, Ontobee was first
developed" --> "Historically, Ontobee was developed"

- Page 7 column 2, paragraph 2: "The internal redirection through the
PURL system is performed by the OBO Foundry library administrators"
--> I think you mean that the library administrators maintain an
OBO-specific PURL server and thus, administer the PURLS.

Review 2 by Simon Jupp

This paper describes the Ontobee linked data server. Ontobee is currently being used to serve up a range of OBO ontologies following some basic linked data principles. The underlying service provided by Ontobee is to support rendering RDF data as HTML, so that it is presented in a more user friendly form when viewed with a web browser.

Ontobee provides a useful service, but is not particularly novel. There are other tools out there that provide similar services. Ontobee tries to distinguish itself from other linked data servers by providing views tailored for looking at OWL ontologies. Things like rendering ancestral classes in a tree, distinguishing various OWL constructs (e.g. annotations from classes), and providing user friendly renderings of logical axioms using Manchester syntax are all good things.

The fact that many OBO ontologies now have URIs that are dereferenceable turns out (from my own experience) to be quite useful. This is in part thanks to Ontobee, but this could have also been achieved with other tools. It's hard to say if there is anything special about Ontobee that makes it more suitable for this task than some of the other tools out there. The authors make some claims in the discussion about user friendly-ness, and suggest that Ontobee is "easier" to work with than other resources. These claims are not really evaluated in any way and no specific details are provided. Despite this, the application itself seems well implemented and the paper is technically sound.

The authors only briefly mention how Ontobee is being used in other applications and the details are rather vague. The main use case they present are for a human searching or browsing the RDF data in a web browser. What about other applications? I would have liked to see a stronger use case showing where Ontobee is being used by external applications to consume the RDF data they are serving up.

This article is an extension to a poster abstract published at the ICBO workshop in 2011. It describes some additional functionality, but doesn't add a significant amount more to the reader. The article needs a considerable amount of proof reading by all the authors. There are quite a few errors in the english that need to be fixed, so my recommendation is accept providing the english is significantly improved.

Review 3 by Jesualdo Tomás Fernández Breis

This paper describes the Ontobee platform, which has been developed to be used as the Linked Data Server of the OBO Foundry ontologies. The paper explains the rationale for the design of the different components of the system and the functionality offered by the Ontobee system, which is available at http://www.ontobee.org/.

The system does not seem to be very innovative in the technical side, since it uses state of the art technologies. As mentioned, Ontobee is said to be the Linked Data Server of the OBO Foundry, which is generating more and more ontologies. Hence, it seems that will be providing a very important service to this very active community, which is a significant contribution of the system.

However, it does not seem that they provide features specifically developed for this kind of ontologies. It seems that the system has been developed for querying the ontologies individually. However, given the theoretical orthogonality of the collection of OBO ontologies, it seems there might be an opportunity for providing other capabilities.

One of the results is the "RDF/HTML 2in" model for publication of the data. There are a few existing tools that support publishing and visualizing Linked Data in HTML which have not been used or reports in this project and it would be interesting to know why. Generally speaking, more related work and discussion about other similar tools produced by the semantic web community should be included.

The paper contains a brief evaluation of the speed of Ontobee queries. I would suggest the authors to provide more information about this evaluation, since it is not very useful in its current form.

Finally, the paper should be revised since it contains a few linguistic errors.

Tags: 

Comments