Building an effective Semantic Web for Health Care and the Life Sciences

Michel Dumontier
Health Care and the Life Sciences (HCLS) are at the leading edge of applying advanced information technologies for the purpose of knowledge management and knowledge discovery. To realize the promise of the Semantic Web as a frame-work for large-scale, distributed knowledge management for biomedical informatics, substantial investments must be made in technological innovation and social agreement. Building an effective Biomedical Semantic Web will be a long, hard and te-dious process. First, domain requirements are still driving new technology development, particularly to address issues of scala-bility in light of demands for increased expressive capability in increasingly massive and distributed knowledge bases. Second, significant challenges remain in the development and adoption of a well founded, intuitive and coherent knowledge representa-tion for general use. Support for semantic interoperability across a large number of sub-domains (from molecular to medical) requires that rich, machine-understandable descriptions are consistently represented by well formulated vocabularies drawn from formal ontology, and that they can be easily composed and published by domain experts. While current focus has been on data, the provisioning of semantic web services, such that they may be automatically discovered to answer a question, will be an essential component of deploying Semantic Web technologies as part of academic or commercial cyberinfrastructure.
Review 1 by Rinke Hoekstra:
The paper sets out requirements for enabling life science knowledge to be shared on the Semantic Web. The paper is clearly written, and the author makes his point effectively. Underlying the paper are a couple of assumptions that could be made more explicit, or at least more substantial. For instance, the author mentions that semantic interoperability requires that descriptions "are consistently represented by well formulated vocabularies drawn from formal ontology". Both the requirements (consistency and basis in formal ontology) are not substantiated in the paper.

In section 3.1, the author mentions that one way to enable large-scale reasoning is to use incomplete algorithms. How does this incompleteness relate to the rather strict requirements of the preceding sections? A related note is that extremely large scale reasoning over OWL Horst (roughly OWL 2 RL) is now a reality (Cf. Jacopo Urbani, Spyros Kotoulas, Jason Maassen et al. (2010) OWL reasoning with WebPIE: calculating the closure of 100 billion triples. In Proceedings of the ESWC '10.). The main problem is: how to query the enormous amount of data produced by these reasoners.

I would also be interested in hearing about the author's vision on how (and if) this community effort will meet the requirements iterated here.

Lastly, the paper seems a bit verbose in explaining SW technology (RDF, OWL). I estimated the audience of this journal to be familiar with these.

Review 2 by Kunal Verma:
The author does a great job of covering current SW activity in the Health Care space. The paper reads well, though there are some typos that can be easily corrected.

1. The author seems to identify the following challenges - 1) scalability, 2) creation of upper ontologies, 3) need for formal representation, 4) better provenance and 5) better UI. All of these are pretty well aligned with general SW challenges. I wonder if there are specific versions of all of these for health care. For example, are there any specific kind of reasoning/representation needed that is specific for healthcare, which may be not be covered by the de-facto standards.
2. One thing that I found to be missing in the paper was a vision of what could be achieved if the challenges could be solved. Maybe a small section on how the field of health care can be improved with the help of SW technologies is in order.