Review Comment:
The paper presents an ontology model for representing changes in taxonomic concepts and linking the related temporal taxonomic concepts together. The motivation for this is to facilitate integration of taxon related data across heterogeneous repositories. Such data contains multiple, differing names for a taxon and the related taxonomic concepts are not usually linked to each other. Taxonomic information represented using the model helps the users to correctly interpretate the relations between taxonomic concepts in various datasets. The model is an extension of the authors' previous work on modeling changes in digital archives (CKA) and it utilizes taxonomic terms from their work on publishing biodiversity data as Linked Data (LODAC). To test the model the authors have created a prototype software for inputting the changes into the system and examining them via a browser user interface. The paper also reports a (smallish) performance evaluation of the model w.r.t. to the response times of SPARQL queries to example data.
Originality:
Though being based on author's previous, more general work and having similarities to other approaches on modeling taxonomies as ontologies, the paper has contribution by providing a practical and usable framework for managing changes in taxonomic concepts. The model's support for chaining change events together (cause and effect) and rules for inferring links between concepts based on changes are novel approaches.
Significance of the results:
The model seems to be capable of handling the basic change operations in taxonomy reasonably well, and it appears to be a nice, usable solution for linking related taxonomic concepts to each other. However, there are some potential issues in the model/paper that need to be clarified, especially distinguishing between a change in a taxonomic concept and a change in a scientific name (see the major remarks concerning page 9 below).
The related work section (in Introduction) is quite brief and should be extended heavily, as many relevant references are missing.
The evaluation reported in the paper basically tests the scalability of the SPARQL engine / triple store in terms of triple amount. It's maybe possible to conclude from this that the complexity of the model (triple amount) is not an issue for current SPARQL engines. It does not however evaluate the model in terms of its usability, added benefits, etc. The outcome of the discussions with the experts using taxonomic information in their research confirms that there is a genuine need for this kind of work. However, more thorough and formal study would be needed for proper evaluation of the model.
Quality of writing:
The language is mostly understandable, though a bit obscure here and there. The readability of the paper would strongly benefit from proof-reading by a native English speaker.
Major remarks
---------------------------------
Abstract (also in other sections) - The term "evolutionary relationship" is a bit dangerous and ambiguous in this context, as the term "evolution" has a specific meaning in biology. To my understanding, in this paper "evolutionary" does not refer to evolution but rather to changes in scientific understanding on how taxa are defined (circumscription, position in taxonomy, rank). It would be helpful if this could be clarified.
Page 2, paragraph 4 - In discussion of the TaxMeOn model: "However, the model does not support the view that an underlying knowledge of the changes is required for the correct interpretation of taxon concepts." - This claim is a bit vague and not entirely correct. TaxMeOn supports modeling the changes of taxonomic concepts (e.g., split, lump, change in classification, change in circumscription). The taxonomic concepts before and after the change event are linked to the change event instance with relations taxmeon:before and taxmeon:after. However, there is no support for linking changes together (cause and effect), thought it would be possible by introducing a single new property. Please clarify this.
Figure 1 - What about a change in circumscription? Is it somehow included in other change types or is it not covered here at all?
1. Introduction, related work - Consider adding the following references:
Berendsohn WG: A taxonomic information model for botanical databases: the IOPI Model. Taxon 1997, 46:283-309.
Page RDM: Taxonomic names, metadata, and the Semantic Web. Biodiversity Informatics 2006, 3:1-15.
Jones AC, White RJ, Orme ER: Identifying and relating biological concepts in the Catalogue of Life. Journal of Biomedical Semantics 2011, 2:7.
Sarkar IN: Biodiversity informatics: organizing and linking information across the spectrum of life. Briefings in Bioinformatics 2007, 8(5):347-357.
Schulz S, Stenzhorn H, Boeker M: The ontology of biological taxa. Bioinformatics 2008, 24(13):i313-i321.
Kennedy J, Kukla R, Paterson T: Scientific Names Are Ambiguous as Identifiers for Biological Taxa: Their Context and Definition Are Required for Accurate Data Integration. In Proceedings of the 2nd International Conference on Data Integration in the Life Sciences (DILS): 20–22 July 2005; San Diego, California. Edited by Ludascher B, Raschid L, Springer-Verlag 2005:80-95.
Also, the relevant TDWG and GBIF standards should be referenced properly (currently just one property from Darwin Core is mentioned).
Pages 5-6 - Related to the discussion of the rule for linking taxon concepts (e.g., after a merge), I was wondering if a specific time point should be given as input to the function (as in the next case - a change in relationship) - or you could mention that the triples should be filtered to contain only the changes relevant to the specific time point. This way, the user would get the taxonomic information relevant in a specific time point. Consider, e.g., the example case of merging Icterus galbula and Icterus bullockii into I. galbula and again splitting it into I. galbula and I. bullockii - after executing the rules for this data, the user gets triples representing the merge and split, but is not able to examine the situation (is it merged or split) in a specific time.
Page 9, RDF listing 1 - Why is the change of a genus of species:Nyctea_scandiaca represented as ltk:TaxonReplacement and ltk:HigherTaxonAddition, and not as ltk:HigherTaxonChange as in similar case in Fig. 4? Is it because also the species name changes (scandiaca -> scandiacus)? In general, if only the name of a species changes, there is no need to create a new taxon concept, because the concept itself (circumscription) hasn't changed.
Page 9, paragraph 4: "Moreover, LTK provides more operations that describes the attributes of a taxon concept such as dwc:scientificName" - As dwc:scientificName should contain the full scientific name (e.g., binomial name for species) according to the spec, how do you handle the representation of a changed name? According to Fig. 4, the URI of a concept stays the same when the genus of a species changes; then the species URI must have both new and old name as dwc:scientificName. How does the model then keep track on which name was valid at a certain time? (I assume that the value of dwc:scientificName is a literal.)
4.1 Performance Analysis - Did you consider testing execution of multiple parallel queries to simulate multi-user scenario? Is a single data point in the graph (Fig. 11) produced by a single query execution or did you calculate a mean value from multiple query executions?
Page 15, paragraph 1: "we implemented a prototype that utilizes the proposed model in order to publish the taxonomic information to LOD Cloud" - I did not see much discussion about Linked Data publishing, e.g., about dereferenceable URIs, in context of your prototype (apart from mentioning the SPARQL endpoint). It seems that the taxon URIs mentioned in the paper are dereferenceable, but they lead to an older service (LODAC) which does not contain the information about taxonomic changes the way they are described in this paper.
Page 15, paragraph 1: "The result of our prototype demonstrates that our approach is feasible and suitable for satisfying the need to link the large amount of taxonomic data across repositories in order to discover a broader knowledge of biology." - This is rather bold statement w.r.t. the preliminary evaluation of the model and because the model hasn't yet really been used to link data across repositories (or at least it's not reported here). The claim should be relaxed.
Appendix, ltk:SynonymLink (Example result) - Though the symmetry of the property ltk:synonym might be justified in zoology, in botany it certainly is not. In botany a synonym is a name that is not correct for the taxon, i.e., it it a synonym of a correct (valid) scientific name. The valid name is not a synonym for the incorrect name.
Minor remarks
---------------------------------
Page 2, paragraph 2: "For example, the Baltimore oriole (Icerus galbula Linnaeus, 1758) and the Bullock’s oriole (I. bullockii Swainson, 1827)." - The sentence is missing a predicate.
Page 2, paragraph 2 (2 times): "I. gulbula" -> "I. galbula"
Page 2, paragraph 2: "with time" -> "over time"
Page 2, paragraph 4: "a semantic web" -> "semantic web"
Page 3, paragraph 1: "SKOS [16] vocabularies" - If this refers to the properties of SKOS model (and not to vocabularies modeled using SKOS), singular "vocabulary" should be used.
Page 3, paragraph 4: "a change from a taxon concept" -> "a change in a taxon concept"
Page 3, paragraph 4: "Kempf" vs. "Kampf" - Check which one is correct and use it consistently.
Page 4, paragraph 2: "Flouris's theory" -> "Flouris' theory"
Page 4, paragraph 2: "Flouris's theory" - Add a reference to the theory.
Page 4, paragraph 3: "we formally propose a model" -> "we propose a formal model"
Page 5, paragraph 1: "tl:beginAtDateTime" - Namespace prefixes should be introduced when (or before) they are used for the first time. The prefix "tl" is not introduced until Table 1 in subsection 2.4.
Page 5, paragraph 2 (also in other sections): "dynamic description" and "static RDF statements" - I understand the point here (after reading further), but the terms "dynamic" and "static" are a bit vague in this context. Maybe this could be clarified somehow?
Figure 4 - What is the rdf:type of ex:theChange1? Could this be added to the figure?
Page 6 - If the notation p(c1,c2) means a triple , then the following corrections should be done:
"subClassOf(ConceptEvolution,?opr)" -> subClassOf(?opr,ConceptEvolution)
"type(?opr,?chg)" -> "type(?chg,?opr)" (3 times)
"subClassOf(RelationshipEvolution,?opr)" -> "subClassOf(?opr,RelationshipEvolution)" (2 times)
Page 6, paragraph 2: "In addition to the rule for linking taxon concepts, we also introduce a rule to transform the dynamic information into a list of static triples." - The rule for linking taxon concepts also transforms the dynamic information into a list of static triples (if I understand correctly), so this should be clarified. You could add something like "[dynamic information] of a change in relationship..." because the rules presented after the sentence are related to changes in relationships.
Page 6, paragraph 2: "Before executing the following rule, it is necessary to filter only some changes so that the input time point exists within its time range." - It is not clear to what the "its" refers to.
Page 6, paragraph 3: "For changes appearing before the specific time point, a relationship between a subject and an object after the change did not exist." - This sentence is hard to understand, please clarify.
Page 7, paragraph 2: "the RDF statement below" -> "the RDF statements below"
Page 7, paragraph 3: "relationships between genus:Columba and its allies" - What do you mean by allies? Please clarify.
Page 7, paragraph 4: "particular proposes" -> "particular purposes"
Page 7, paragraph 4: "are descend from" -> "are descended from"
Table 2 - "skos:relatedMatch" as superproperty of ltk:mergedInto, would "skos:broadMatch" be more accurate?
Table 2 - "skos:relatedMatch" as superproperty of ltk:splitInto, would "skos:narrowMatch" be more accurate?
Page 9, RDF listing 1 - The change type "ltk:HigherTaxonAddition" and property "cka:detail" should be introduced in the text before using them in the RDF example.
Page 9, RDF listing 2: "genus:Bubo ltk:majorMergedInto genus:Bubo_1999 ." - I don's see how this triple can be inferred from the original RDF statements describing the changes. Either change the property into ltk:mergedInto or add this "major" information into the original RDF statements.
Page 9, paragraph 4: "ltk:higerTaxon" -> "ltk:higherTaxon"
Page 9, paragraph 4: "operations that describes" -> "operations that describe"
Page 9, paragraph 5: "the formal model is described by the temporal change in taxonomic knowledge, and rules for executing the dynamic descriptions." - There is something weird in this sentence, maybe the "described by" should be changed to some more suitable verb.
Page 9, paragraph 5: "for specific purpose" -> "for a specific purpose"
Page 10, paragraph 4 (also on page 13): "XSD:DateTime" -> "xsd:dateTime"
Page 10, paragraph 5: "business layer" -> "business logic layer"
Page 12, paragraph 2: "then assign a concept" -> "then assigning a concept"
Page 12, paragraph 3 - In URL "http://rc.lodac.nii.ac.jp/ltk/concept.php?conept=http://lod.ac/species/B... -01-01T00:00:00Z", "conept" -> "concept", and remove the space after "1998"
Figure 10 - The instance of ltk:ReplaceTaxonConcept is both in the "Detail of change" and "Caused by" sections. Is this appropriate?
Page 13, paragraph 1 - In URL "http://rc.lodac.nii.ac.jp/ltk/concept.php?conept=http://lod.ac/species/B...", "conept" -> "concept"
Page 13, paragraph 2 - In URL "http://rc.lodac.nii.ac.jp/ltk-service/context/?concept=[taxonconcept]&ti...", "time" -> "date", and remove the slash "/" after "context" (otherwise the server replies HTTP 404)
Page 13, paragraph 2 - In URL "http://rc.lodac.nii.ac.jp/ltk-service/reason/?concept1=[taxonconcept1]&c...", remove the slash "/" after "reason" (otherwise the server replies HTTP 404)
Page 13, paragraph 2 - I tested the web service "reason" to get the background knowledge of the change of two concepts, but I couldn't get any sensible responses. The service always returns the same fixed set of triples regardless of the values of parameters concept1 and concept2. Please check that the service works as intended.
Appendix, ltk:TaxonMerger - In "ex:mb1 ltk:majorMergedInto ex:af1.", "ex:mb1" -> "ex:mb0", and add space after "ex:af1"
Appendix, ltk:TaxonMerger (also in ltk:TaxonSplitter) - In "ex:mb0 skos:closedMatch ex:af1.", "skos:closedMatch" -> "skos:closeMatch", and add space after "ex:af1"
Appendix, ltk:TaxonSplitter - In "cka:majorConceptBefore ex:ma0 ;", "cka:majorConceptBefore" -> "cka:majorConceptAfter"
Appendix, ltk:ChangeHIgherTaxon - In "ex:p2 skos:narrowTransitive ex:c1 .", "skos:narrowTransitive" -> "skos:narrowerTransitive"
References: "[7] Health T" -> "[7] Heath T"
URLs should not be hyphenated (as some of them are in the text and in References), please correct those.
|