LegalNERo: A linked corpus for named entity recognition in the Romanian legal domain

Vasile Pais
Maria Mitrofan
Carol Luca Gasan
Alexandru Ianov
Corvin Ghiță
Vlad Silviu Coneschi
Andrei Onuț

Harald Sack

Dataset Description
LegalNERo is a manually annotated corpus for named entity recognition in the Romanian legal domain. It provides gold annotations for organizations, locations, persons, time expressions and legal resources mentioned in legal documents. Furthermore, GeoNames identifiers are provided. The resource is available in multiple formats, including span-based, token-based and RDF. The Linked Open Data version is available for both download and querying using SPARQL.
Thank you for submitting the revised version and for shortening the paper to a more reasonable length. Also thank you for taking care of the figures and other aspects I suggested in my previous reviews.

I only have a few minor remarks, which have to do with the presentation and length of the paper.

You appear to have followed my suggested cuts exactly, which resulted in quite a few very short lines of text, which only consist of a single word or syllable, for example:

- Page 1, right column, line 36 ("tion 8.")
- Page 2, left column, line 7 ("processes.")
- Page 2, left column, line 19 ("gal entities.")
- Page 2, right column, line 17 ("NEs.")
- Page 3, left column, line 13 ("correct these mistakes")
- Page 3, left column, line 20 ("tators.")


For these and all other short or very short lines I strongly suggest to edit the corresponding paragraph in such a way that the whole paragraph becomes a bit shorter so that these short lines are avoided. Two reasons: such short lines obviously add to the length of the paper and they should be avoided from a typography point of view. This comment also applies to items in itemize/enumerate environments and in all other cases. In almost all cases a simple reformulation of what is said in the corresponding paragraph will avoid the short line.

A related comment: footnotes take up an enormous amount of space (in the LaTeX class of the Semantic Web Journal one footnote takes up 2-3 lines of text), so I'd suggest to go through all footnotes one more time and to decide for each footnote if it's really needed. If not, please delete it.

My second major comment relates to Figures 1, 2, 4, 5, 6 and 7 and Appendix A: instead of the basic font (Times New Roman), which is a variable-width font, please use a fixed-width font – Courier is the obvious choice. The current versions of the figures/listings are very difficult to read/decipher, using Courier will make a big difference in terms of improving the readability/usability of these figures. Please also consider applying syntax highlighting (using colours etc.) as provided by the listings package.

Thanks to the authors for submitting an updated version of the manuscript. The authors have already addressed most of the major concerns I have had. Therefore, I would like to accept the paper as it would be a significant contribution in the field of Legal data for future research.