Review Comment:
"(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). "
------------------
This paper presents TermItUp, a generic architecture integrating multiple state of the art tools with the purpose of providing a one-stop-shop for all terminology extraction needs.
The tool has been developed following FAIR and open science principles, using standard LLOD and LOD formats, guided by a set of requirements based on observations in the state of the art,
but also discussions with terminology experts.
The tool will be an extremely useful to the community as the systematic integration of its different components for each single project would be incredibly time-consuming and ad-hoc.
There are high impact uses of the tool in H2020 and other collaborative projects, showcasing the potential of the tool.
"(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool. Please also assess the data file provided by the authors under “Long-term stable URL for resources”."
------------------
The paper is well written and organized, and clearly tell the story of TermItUp in terms of its capabilities and shortcomings.
There are small language corrections to be made for the camera ready version, highlighted at the end of this review.
The review of literature is complete, although I would have preferred to see at least a mention of SOTA efforts for terminology-extraction around TermEval2020/ACTER and an explanations as to why such systems although theoretically very accurate would be very difficult to integrate in a production system.
There's a very interesting multilingual extraction system in the 2021 finding of the ACL https://aclanthology.org/2021.findings-acl.316.pdf
I mention this because it's interesting, but the positioning of the paper doesn't necessarily require to go into this particular literature.
Perhaps the mention in the paper that there are ongoing efforts for multilingual terminology extraction actually refers to this.
Regarding the perspectives, I would love to see the future integration of a knowledge-graph aware association rule mining approach in addition to the extraction of hierarchical relations.
I am not asking to mention this in the paper, just an interesting thought.
"In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data,
(B) whether the provided resources appear to be complete for replication of experiments, and if not, why,
(C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and
(4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information."
------------------
The github repository is easily accessible and significant information is given in the README or on the main website of the tool.
The overall explanations are clear and the documentation of the API provided through swagger is very helpful.
The main improvement directions would be 1/adding a more technical documentation that explains how to deploy the service 2/adding some docstring documentation for developers
3/potentially refactor the code base if there is a subsequent increase in complexity, I would personally favour a generic class structure with polymorphic genericity rather than reflexively loading code modules that all include the same functions, even though the latter tends to create less overhead.
As the tool is still a prototype under active development, those are not significant issues with regard to the publication of the paper.
**I recommend acceptance of the paper**, the corrections are mostly cosmetic for the camera ready.
***Detailed corrections***
Page 2 line 12 left: the bit of the sentence about DBPedia is confusing, I don't understand what it means.
P2 l19 left: is to find -> is finding
P2 l25: different backgrounds and expertise levels to face language and related needs [...]
P2 l30 right: discussions that have arisen
P2 l37 right: This section covers
P2 l38 right: different processes mobilized in our system
P3 l42-43 right: Combining Wikipedia and other resources, BebelNet constitutes an multilingual [...]
P4 l9 left: domains, with half being closely related to [...]
P4 l10 left: Several scientific works are devoted
P4 l12 left: a SPARQL
P4 l28 left: can be of great help
P4 l6 right: corpus -> corpora
P4 l36 right: These can correspond to different [...]
P4 l43-45 right: The meaning of a unit is to be discovered in text and constructed through relations to other terminological units.
P5 l22 right: can significantly contribute to improving performance [...]
P5 l22 right: to developing
P5 l45 right: this translates to a necessity [...]
There are aditionnal small corrections like these to be made, can transmit feedback to authors later as time permits, preferable to delaying the review submission.
|