Scientific Discourse on the Semantic Web: A Survey of Models and Enabling Technologies

Paper Title: 
Scientific Discourse on the Semantic Web: A Survey of Models and Enabling Technologies
Simon Buckingham Shum, Tim Clark, Anita de Waard, Tudor Groza, Siegfried Handschuh, Agnes Sandor
The desired outcome of all scientific endeavour is to advance the the body of accumulated knowledge in a materially verifiable way. This knowledge is communicated through the research literature, which presents scientific claims and their justifications through forms of discourse, expressed in document genres legitimated by a given research community. The study of the rhetorical and argumentative characteristics of such discourse has long-standing traditions, the results of which also provide insights into how scientific publishing, search and debate might take new forms on the social-semantic web. This article surveys, for a general readership, the growing body of work that models scientific discourse for social-semantic web applications, and offers a framework highlighting key features to help compare the various models. Secondly, we present examples of tools based on discourse models, which facilitate semantic navigation, structured debate, human and machine annotation of scientific texts, and literature analysis/alerting services. Finally, we identify some of the open research challenges confronting the field, and summarise the ways in which they are being tackled.
Full PDF Version: 
Submission type: 
Survey Article
Responsible editor: 
Major Revision

Solicited review by Delphine Battistelli:

This paper gives an interesting and original survey of approaches dealing with rhetoric/argumentative analysis of scientific literature: first, by putting the stress on the necessity to deal with it to improve the ways of exploring large collections of research papers (for comparing results, research data… and to supply debates and confrontations) and thus for improving connections between scientists (authors, readers, and users); second, by putting the stress on the difficulties to automatise – at
least partially – such a task from "real" texts. From this point of view, a recapitulative list of main involved linguistic markers types would have been informative.

Solicited review by anonymous reviewer:

The authors present a review paper addressing the question "What does scientific publishing and discourse look like on the social, semantic web?". The authors have organized their review around 3 axes: i) modeling scientific discourse (rhetorical structure of publications and discourse), ii) machine annotation of scientific discourse, iii) semantic tools for scientific discourse.

Summary of the review:

This review has been written considering the journal guidelines as outlined at The reviewer has also considered how is this review providing the reader with a critical analysis.


From the Journal's guidelines:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic:

Reviewer: The paper does not fully cover previous work in a way so that it has the potential to become an introductory text. The review presented by the authors tends to reference their previous work; 30% of the references pointing to their own previous work. There is the need to improve the analysis and present a more thoughtful discussion; the discussion is absent in this paper, assuming section 5 is a discussion the authors are using for this section less than a full page. For a review, this paper doesn't really present a comparison framework. It does not consider how the models have been evaluated (have they been evaluated at all?), nor does it consider how the models have been used.

(2) How comprehensive and how balanced is the presentation and coverage.

Reviewer: The paper does not present a balanced review, nor does it comprehensively cover the field related to topic the authors are addressing. There are too many self-references; the 3 axes of this review are related to previous work developed by the authors, there are published papers and deployed tools the authors are simply omitting. They don't define a coherent and significant criterion for comparing the approaches they are presenting.

(3) Readability and clarity of the presentation.

Reviewer: There are problems with the architecture of the paper as well as with the English. On the one hand, the authors don't clearly explain why and how are the axis ( those upon which the review is built) related to one another and to the overall research issue they are addressing. In addition, some paragraphs are too long; some other paragraphs are too short. There are also problems with the references. Furthermore, the paper should be harmonized; it has too many different writing styles. This lack of uniformity is also reflected in the English, some sections are well written; some other sections are poorly written. On the other hand, there are several grammatical problems that make it difficult for the reader to understand the paper.

(4) Importance of the covered material to the broader Semantic Web community

Reviewer: the paper addresses various important issues; These are particularly key for the Semantic Web community. It is precisely because of the importance and relevance of the topic at hand that the reviewer strongly advises the authors to improve their work and resubmit.

Specific comments and issues to be addressed.

Below are listed some of the issues and comments the authors should consider when re-writing the paper.

From the paper: "approaches that are being taken to answer"
Reviewer: improve/re-write.

From the paper: "to deliver sensible answers to queries which any scholar would consider fundamental to a critical perspective"
Reviewer: quite confusing, what are they talking about? Improve

From the paper: "Who disagrees with this theory?"
Reviewer: which theory? This is presented in the introduction; no theories were presented in the first paragraph of the introduction. What theories are the authors refereeing to?

From the paper: "Was this prediction ever fulfilled?"
Reviewer: which prediction? Also, probably prediction is not the right word as this is not a fortuneteller related issue.

From the paper: "What assumptions does this approach depend on?"
Reviewer: which approach?

From the paper: "Are there different schools of thought around this problem?"
Reviewer: which problem? Actually, I would like to know about work done in domains such as legal repositories for which identifying claims, arguments, counter arguments etc is very important. The authors don't mention any work related to domains other than some specific biomedical related fields. What about the work done by The Guardian? The NY Times, Reuters, etc.

From the paper: "Claims to contribute to the literature in a given field are made using carefully constructed arguments,
Reviewer: improve/re-write. Claims contributing to… also, very often in the text the authors make assertions with no concrete examples or supporting evidence.

From the paper: "a manner which will bear peer review"
Reviewer: improve/re-write

From the paper: "in order to deliver services and user interfaces which treat scientific knowledge"
Reviewer: improve/re-write. "treat"?

From the paper: "We teach our doctoral students how to construct knowledge-level claims"
Reviewer: just doctoral students should bear this in mind?

From the paper: "The techniques and tools described in this article seek to formalize some of these patterns"
Reviewer: which patterns? No patterns are mentioned earlier in the text so that this portion makes sense.

From the paper: "

From the paper: "What does scientific publishing and discourse look like on the social, semantic web", "…knowledge not so much as a set of documents, but rather, as a network of meaningful, conversational moves which can be modelled (REVIEWER: TYPO) as semantic relationships between nodes.", "Beyond digital replicas of this paper artifact, our challenge is to invent a future in which the internet more radically improves the effectiveness in the ways in which we make, disseminate and contest knowledge level claims."
Reviewer: the typo, modeled, should be modeled.

From the paper: "§2 reviews approaches"
Reviewer: anthropomorphism, also why are the authors using "§" instead of something along the lines of "In section 2 we present…"

Reviewer: There is no transition between section one and two. Also, how are the approaches described by the authors being related to the research question? How is the rhetoric related to the argumentation? How are the authors understanding the rhetorical structure, what is rhetoric?

From the paper: "Although the corpus used as a foundation for the analysis was about experimental molecular dynamics, the resulted model is uniformly valid for any scientific domain."
Reviewer: This is related to the work done by Harmsze, 2000. A very strong claim; completely unsupported. F.A.P Harmsze does not suggest the work presented in her PhD thesis is universally applicable. The authors present this work as "valid for any scientific domain".

From the paper: "From the evaluation perspective, the authors performed a preliminary evaluation of the model, which showed that the model satisfies the purpose for what it was designed, but in reality, to our knowledge, it was not deployed in an actual application and consequently it failed to be adopted.
Reviewer: this is, as far as I can assume, related to Harmsze's work. Funny, here the authors acknowledge the lack of evaluation and adoption of he work. Early on the authors were presenting this as "valid for any scientific domain".

Reviewer: How were "Harmsze", "ABCDE", "SALT", etc selected? Have these proposals been evaluated? If so, how? The comparison provided by the authors does not include evaluation in the matrix, nor does it address applicability (as in have these models being used? Is there a prototype allowing us to "see" them in action).

From the paper: "This proposal for conference proceedings in computer science was to develop LaTeX-stylesheet
Reviewer: improve/re-write

From the paper: "but the model was too cumbersome to apply wholesale to a collection of papers.
Reviewer: improve/re-write

From the paper: "For an overview of these considerations, comparisons with Harmsze's work and to a modularly authored reference work,"
Reviewer: which considerations? Improve/re-write

From the paper: "Current collaborations are underway to unite this segment-centered view with work done on the meta- annotation of biological events (Nawaz et al, 2010) and the annotation of Core Scientific Concepts (CoreSCs) which are at the level of a sentence, and include Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion (Liakata et. Al, 2010)
Reviewer: how is this relevant? Also, is there a reference for CoreSCs?

From the paper: "Hyp-ER (2009). Hypotheses/Evidence/Relationships Workshop, Elsevier Science, Amsterdam. 11-12 May 2009:
Reviewer: for how long does a URL least? This URL is taking me to: The authors should make sure they are, at the very least, having their references right.

From the paper: "Das, S. 2010. Scientific Discourse: Discourse, Data and Experiment. Presentation available on Slideshare:
Reviewer: Please keep consistency throughout the paper. Most of the times the authors are having URLs as footnotes, not as references. This one in particular, however, is a reference.

From the paper: "Buckingham Shum, S., Sándor, A., De Liddo, A. and Bachler, M. (2010). Integrating Human & Machine Document Annotation for Sensemaking. Knowledge Media Institute Seminar, The Open University, UK (11 Nov. 2010). Replay available at:

Reviewer: This is pointing to:
Is this the right URL? this seems to be a webcast, shouldn't this be a footnote? What is the journal's policy on referencing webcasts, slidesahres, etc?

From the paper: "W3C Health Care and Life Sciences Interest Group:
Reviewer: This is the scientific discourse task page. The right URL, the one for the W3C Health Care and Life Sciences Interest Group is:

Reviewer: How is the work reviewed by the authors related to

Reviewer: this paper is somehow similar. The authors should read this paper carefully, it is similar and better structured.

Suggested literature

Calling International Rescue: knowledge lost in literature and data landslide!
Biochem. J. (2009) 424, 317–333 (Printed in Great Britain) doi:10.1042/BJ20091474

Shotton, D., Portwin, K., Klyne, G. and Miles, A. (2009) Adventures in semantic
publishing: exemplar semantic enhancements of a research article. PLoS Comput. Biol. 5,

Biochem. J. 418, 39–47 65 Bourne, P. E. and Fink, J. L. (2008) I am not a scientist, I am a number. PLoS Comput.

N. Noy, Bioportal: a community-based repository for biomedical ontologies
And data resources, in: AAAI2009 Spring Symposiumon Social SemanticWeb:
Where Web 2.0 Meets Web 3.0, 2009.

D. Rebholz-Schuhmann, et al., Text processing through Web Services: Calling Whatizit, Bioinformatics24(2)(2007)296–298.

J. Clement, S. Nigam, M.A. Musen, The open biomedical annotator, in: AMIA Summit on Translational Bioinformatics, San Fransisco, 2009.

Clement Jonquet, Paea LePendu, Sean M. Falconer, Adrien Coulet, Natalya F. Noy, Mark A. Musen, and Nigam H. Shah, NCBO Resource Index: Ontology - Based Search and Mining of Biomedical Resources. Winner at the Semantic Web Challenge Open Track, ISWC2010.

Annotating Rhetorical and Argumentative Structures in Mathematical ...
At DERI, they do this (and more) Lange (Jacobs University) Annotating
Rhetorical and Argumentative Structures in Mathematical 14, 2008 3
October Knowledge ...

Collaborative and Argumentative Models of Meeting Discussions

Rhetorical structure theory: Toward a functional theory of text organization – Mann, Thompson – 1988

Ontology-based discourse understanding for a persistent meeting assistant
by John Niekrasz, Matthew Purver, John Dowding, Stanley Peters — 2005 — In Proceedings of the 2005 AAAI

There is work in CS (AI) in discourse analysis the authors could also check.