Review Comment:
Overall the paper does not address issues related to multilingual LOD. The authors mention the existence of English names in the datasets they are using (mainly in Japanese) but do not discuss or detail how they have dealt with multilingualism during the process of integrating those 3 datasets and publishing them as LOD.
The main issue with the paper is failing to position the approach (i.e., based on names as central element rather than taxon) by comparison to other models/initiatives, by the way properly identified (DwC & TaxonConcept). In addition, the paper is missing some examples along the paper.
However, the contribution in terms of published data to the LOD is clear and valuable.
Major comments by sections:
- Abstract: You do not clearly mention you contribution in the abstract. You should explicitly mention the 3 datasets that have been integrated, the data model designed to support such integration, provide some statistics as well as a link to the available LOD. Also, you do not mention here the specificity and advantages of your approach which is to use name as primary objects rather than taxa.
- If “data hub” is the name of your system/result, then identify it clearly and reuse that name along the paper. Briefly describe that term at first use in the abstract or introduction.
- S2: Related work section is a bit light. Biodiversity Informatics is not just databases about biodiversity. Final links in the section to Dwc and TaxonConcept are interesting and more discussion about these (and others) would be appreciated here. You should bring the reader to understand the limits of related work up to date and announce here the drawbacks/limits that your system addresses.
- S3: “fulfill the requirements”. You do not mention any “requirements” in S1. You gave a review of 3 diversity issues, but no requirements for a system or potential solution to have.
- S3: “fist class entities”. Your use of this expression is very ambiguous. The famous SICP book has a formal definition for this (from Strachey) but I am not sure this is what you mean.
- S3: You should clarify and provide examples which compare your work to Dwc and TaxonConcept. This aspect (considering name as the primary element/identifier) seems to be what distinguishes your approach the most, therefore, you should spend the appropriate space on detailing it. Taxon concepts cannot be always consensual, less than names, however, we all know names can be very ambiguous and yourself acknowledge the homonymy problem, so could you please backup more your choice and discuss this aspect.
- S3: What do you mean by “identification (…) can be postponed”? How it is positive?
- S4: What is a “source”, examples?
- S6: Would you say this data model is a “common data model” for the 3 described resources.
- S6: Could you provide an explicit comparison of your data model presented in fig1 with alternatives approaches such as Dwc or TaxonConcept. This is indeed needed to evaluate the contribution and relevance of your choices. What is exactly the node “species” compared to other nodes? Is this a owl:class?
- S7: as previous remark on homonymy, you mention finding 1797 issues, but do not say what you did to deal with them.
- S8: “translated the data to linked data”: could you be more specific, could you refer to LOD publishing best practices (i.e., demonstrate your dataset deserved a 5-stars mug). Especially, could you specify how did you generate the “out-going links” to other datasets, which ones (in addition to DBPedia) and why did you choose those ones?
- S9: “can be linked to each other” this is a feature approaches based on taxon (rather than names) would have also enabled. You should conclude on the value added of your approach.
- References: You have a shift between your references citation and your reference list + [7] does not exist. Seems that [1] has concatenated 2 references.
Minor comments:
- Expand the LOD acronym in the title & abstract
- Abstract: “which is a key… fields” repeats the phrase just before.
- Abstract: What do you mean by “adopted”, at that time in the reading it’s not clear.
- Abstract: “relationship”S
- Abstract: “Japanese & English names”
- S1: “Biodiversity becomes a (…) problem”. I would not say “biodiversity” is the problem. This is a concept, a measure, a science. The “scientific & social” problem are actually mass extinctions, climate changes, etc. and their effects on biodiversity.
- S1: Data lacks information about relationships as well as semantically rich descriptions (i.e., with ontologies).
- S1: Expand the DDBJ and NCBI acronyms here at first mention.
- S1: “their relationships”, do you mean interconnections, interoperability?
- S1: “from each other” + also less visible to the community.
- S1: “build a data hub” + to address the three diversity issues mentioned before
- S2: You should provide reference for “under discussion”.
- S2: You should provide reference taxonconcept. When possible you should prefer scientific references rather than links.
- S3: “It provides”, what is “it”? The policy?
- S3: “It also provides”, what is “it”?
- S3: NCBI acronyms must be expanded before in paper.
- S3: “The second”-> the second policy
- S3: “Processing on the network” what do you mean?
- S3, last sentence: English
- S4: “two main parts”, -> two main parts in BDLS
- S4: What is the “dictionary for terminology”
- S4: “It contains”, what is “it”? *2
- S4: “provenance” -> provenance information
- S4: “The analysis” what kind of analysis, that says what?
- S4: “Species2000”, reference?
- S5: Enumerated item and following paragraph start with same sentence/expression.
- S5: Missing ref for Bryophytes DB.
- S6: “expressed in named graph” + and formalize in OWL, no?
- S6: Footnote #5 is not clear.
- S7: LOD acronym should be defined at first use.
- S7: “but we can just a set”: English
- S7: “Though (…) cases”; English
- S8: “the whole (…) yet”: could you clarify, and along the paper be sure we understand what’s the “Data Hub”, what’s the “dataset”
- References: bad left-side alignment
|