Bridging legal documents, external entities and heterogeneous KBs: from meta-model to implementation

Paper Title: 
Bridging legal documents, external entities and heterogeneous KBs: from meta-model to implementation
Gioele Barabucci, Angelo Di Iorio, Francesco Poggi
Every legal document contains references to external entities: people, organizations, concepts and so on. In this paper we present ALLOT, an ontology to describe non-documental entities referenced in Akoma Ntoso documents and legal documents in general. We also discuss how to develop new ontologies, XML and DB schemas that follow the current best practices and avoid the common pitfalls found in the legal domain.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Reject and Resubmit

Submission in response to

Solicited review by Rinke Hoekstra:

This paper introduces an ontology (ALLOT) for modeling the content of regulations marked up using the Akoma Ntoso XML format for legal documents. It discusses a number of (legal and para-legal) ontologies and categorizes them into three categories: content-centric, document-centric and integration-centric ontologies. It describes the Akoma Ntoso top level concepts (in the non-ontology) and introduces ALLOT as a formalization of the Akoma Ntoso TLC, that is linked to other, existing ontologies and concludes with formulating best practices.

It is quite a challenge to motivate and describe an ontology in writing (as I have experienced many times before), but apart from demonstrating the intellectual challenges of designing the ontology, it is crucial that the paper shows that the ontology solves a real problem. In the case of ALLOT, the paper does not really convince me that ALLOT is not just yet-another-ontology. This may or may not be a problem with ALLOT itself, but this is hard to tell from the paper itself.

* Section two distinguishes three categories of ontologies, without referring to the large number of ontology-categorizations discussed in the nineties and early tens. What is the need for this categorization? Other than spending five pages on discussing ontologies (most of which are certainly covered by the dissertation of Núria Casellas), the paper should better motivate the need for the categorization itself. The ontologies described in section 2 are clearly selected for a reason, but this is not made explicit. In short, rather than presenting the categorization at a general level (i.e. as applying to legal ontologies in general), the categorization is probably best presented as part of an inventory of ontologies to align to the Akoma Ntoso TLC.

* Section three describes Akoma Ntoso, but it contains a rather absurd passage (in 3.1) about why committing to a widely shared, stable family of standards for modeling semantics (RDF, OWL) is bad for reuse in the long run. I don't get it. Why are RDF and OWL unsafe bets for reusability in 10 years, where UNICODE and XML are? XML is only a year and 12 days older than RDF (feb 10, 1998 vs feb 22, 1999). XML is the standard serialization of RDF. Additionally, the 'long term' usability argument is moot, David Rosenthal's blog, in particular the post at [1] is a good entry point on discussions of format preservation through time.

Overall this section is overly conservative with respect to the use of semantic technologies. Indeed serializing graphs is tricky, but there have been proposals for canonical (XML) serializations of subsets of RDF in the past (subsets, because BNodes create problems for these syntaxes). Also, the argument against updating standards based on the 'chain of trust' is weighed too heavily. Surely there exists a trade-off between usability and 'trust'. Lastly, the authors should find a different 'victim' for the DAML+OIL example: DAML+OIL was never a real standard with wide adoption.

Section four describes the naming convention of Akoma Ntoso, how it is related to the TLC, and how URIs based on it can be used to both identify and dereference to the actual entities they represent. There have been plenty of publications about the Akoma Ntoso naming convention and dereferencing scheme (i.e. it has largely found its way into the MetaLex CEN work as well). I do not see what this section contributes here. (NB also the URI dereferencing bit is not a contribution of this paper, but of the community)

Section five then introduces ALLOT. Apart from the ontology itself, it is unclear how its development is based on the arguments of the preceding sections. In particular, all of section three seems to contradict the choices made in this section: adoption of Semantic Web technologies, linking to other existing ontologies (only a few of which are standards), etc. Secondly, the section does not motivate the *need* for ALLOT. Looking at the TLC categories: why would LKIF Core not be sufficient? It covers all 10 main top level classes of TLC, including the FRBR levels of CEN MetaLex. Am I wrong in thinking that we (in the ESTRELLA project) already 'solved' the issue back in 2008/2009? Please explain.

More specifically:
* How does allot:Concept relate to e.g. owl:Class or even rdfs:Class... these seem to fulfill the same purpose.
* The types of links between ALLOT concepts and other ontologies are described, but they are not always motivated.
* Vocabulary: OWL 'does not allow the use of data properties in property chains'... please use OWL 2 DL, or at least OWL DL.

Section five does not describe what problem the ALLOT ontology solves, nor does it discuss any evaluation as to its quality (e.g. with respect to query performance using ALLOT compared to using other ontologies)

Section 7 presents conclusions and best practices. The point about contextuality and time-dependence of statements is an interesting one. It is a pity the authors do not dwell longer on the issue in other parts of the paper. Also, I do not see how ALLOT addresses this differently than other approaches (e.g. referring to LKIF Core, DOLCE and many others). I agree that these issues should be addressed at a more fundamental technological level (e.g. the use of RDF graphs to express context, etc.).

The other issues raised in this section are not really a contribution compared to other work in the area of ontology engineering. Also, they need more 'body' (literature study, broader use cases etc.) to justify the title 'best practices'


Solicited review by Miriam Fernandez:

After reading the paper a couple of times it is still not clear to me what is the problem that the authors are trying to address and what is their contribution with respect to the problem.

Authors talk about several different things: (i) making explicit the references that legal documents have to external entities, (ii) enhancing the temporal description of events, (iii) avoiding the use of one particular modelling language/technology, (iv) propose best practices for ontology construction, (vi) propose an implementation of Akoma Ntoso "non-ontology" (I still do not understand what authors mean with the "non-ontology" terminology), (vii) enhance the interlinking among entities within legal documents, (viii) propose a new ontology, ALLOT. Authors should please make their research problem/goal clear.

Regarding the state of the art, authors provide a deep discussion of the different ontologies available within the legal domain. However, they do not clearly describe what these ontologies are missing so that a new ontology, ALLOT, needs to be generated. The only criticise to these ontologies that I can see in the paper is that "although different ontologies have widely different focuses, none of the ontologies we mentioned falls exclusively in one category" … and why is this something bad? Ontologies fit a different purpose. The question here is what is the scenario that the authors are aiming to address and why none of the presented ontologies fit the requirements of this scenario.

Regarding the proposed ontology/approach it is not clear to me the criticism towards current representation languages such as RDF and OWL and the authors' decision to implement their ontology in unformatted text (with some XML references to URIs). Firstly, RDF and OWL are not technologies but description languages, and secondly, seems a bit contradictory to decide not to use RDF or OWL but still use XML.

Another element of the authors' discussion I do not understand is the break of the digital signature. Content and meta-data are different elements, and changes on the metadata format should be completely independent from the changes on the content.

The work also does not present any evaluation. I would expect at least that this ontology is presented to ontology-engineers/domain-experts for its evaluation, so that they can assess its advantages/disadvantages with respect to current state of the art ontologies.

To conclude authors mention that "current semantic technologies like RDF or OWL lack the features needed to express in a simple way the kind of statements these details require". When making such a strong claim authors should provide concrete examples of what they are modeling that can not be modeled using RDF or OWL. Following the examples provide on the paper I still don't see the features that authors aim to model and that are not representable using RDF or OWL.

Solicited review by anonymous reviewer:

The goal of this article, "Bridging legal documents, external entities and heterogeneous knowledge bases from meta-model to implementation" is to present the ALLOT ontology. This ontology is presented as a proof of concept to describe non-documental entities referenced in Akoma Ntoso documents and legal documents in general. Also, the paper is directed at discussing how to develop "new ontologies, XML and DB schemas that follow the current best practices".

First, the paper devotes 4 sections to the introduction and the description of current legal ontologies, their classification, and the Akoma Ntoso standard. Then, sections 5, 6 and 7 describe ALLOT conceptualization, its mapping to other ontologies and legal ontology development best practices. In general, the first part is too lengthy and detailed, while the second part is too short and less documented. The description and comparison of legal ontologies and the Akoma Ntoso specification are given a thorough detailed analysis, while the description of the ALLOT ontology and modeling process is rather scarce. In particular, legal ontology modeling best practices are presented within the conclusions. There is a need to provide further information on the conceptualization of the ALLOT ontology and its application (tests). The paper should be further developed in this direction for its journal publication.


- Section 2: "It is not a case that such ontologies usually exploit bibliographic models already established outside the legal domain". Clarify the meaning of this sentence in the current context.
Explanation on the choice of the domain ontologies to be compared (and the two ontologies to be mapped) ought to be provided. Also, a generic reference to up-to-date reviews of legal ontologies or collections of legal ontology descriptions (either in articles or in books. E.g. "Law and the Semantic Web", "Approaches to Legal Ontologies", "Legal Ontology Engineering", etc.) could provide more information and references to the readers on legal ontologies, beyond the ones explored in this article.
The usage of the term lightweight ontology to refer to synset-based or lexical ontologies only might be confusing.
Perhaps if the discussions in section 2 were introduced with the conclusions of last paragraph of section 2.3 it would ease the assumptions being made during the classification. For example, that Legal Case Ontology is "document-centric" in the discussion, while then in Table 1 it is classified in different types.
- Section 3: There is a general lack of references in section 3.1, in particular, on the statements regarding the choices during the Akoma Ntoso project.
Description of technical specifications and modeling decisions and conceptualizations are intertwined, also in section 4.
- Section 5: While a large portion of the paper has been dedicated to the description of legal ontologies and Akoma Ntoso specifications, very little is provided with regards to the ALLOT ontology and its modeling decisions. Section 5.2, the description of the implementation layer is, in particular, very scarce. An example is provided, but not a full specification of the modeling strategy and evaluation results.
This is also applicable to Sections 6.1. and 6.2. For example, tables showing the mappings, beyond the examples provided, would be appreciated. The description of the ontologies being mapped should also be improved.
- Section 7: The legal ontology "flaws" should be specified (section 6 deals with two ontologies, generalization should be explained). In the same line, the recommendation on OntolClean and legal design patterns as methodological approaches should be further explained. Best practices, as generalized from the different sections presented, should not be presented as conclusions without proper explanation.

Some typos:

p2. "advices" - "advice"
p8. "many of the existing ontology" - "many of the existing ontologies"
p12. "the resources they references" - "the resources they reference"
p.21 "deal with memberships" - "deals with memberships"
p.21 "First of all, the ontology and the mixes the concept of person with that of role" - "First of all, the ontology mixes the concept of person with that of role"

Solicited review by anonymous reviewer:

The paper is well written and the presented work is justified. In order to improve the paper, I would recommend:
- To better explain the relevance of the work; i.e., although the authors do justify the context of their work, it would be better to add some explanations about why this is a step forward and for what purpose.
- In section 6, the presented alignments could be better justified; i.e., why are exactly these specific alignments explained.

There are some english language typos.