Abstract:
The availability in machine-readable form of descriptions of the structure of documents, as well as of the document discourse (e.g. the scientific discourse within scholarly articles), is crucial for facilitating semantic publishing and the overall comprehension of documents by both users and machines. In this paper we introduce DoCO, the Document Components Ontology, an OWL 2 DL ontology that provides a general-purpose structured vocabulary of document elements to describe both structural and rhetorical document components in RDF. In addition to describing the formal description of the ontology, this paper showcases its utility in practice in a variety of our own applications and other activities of the Semantic Publishing community that rely on DoCO to annotate and retrieve document components of scholarly articles.