Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.
The paper describes a knowledge graph related to silk textiles in museums. In the end of the paper the authors apply automatic image and text analysis for data enrichment. The paper was submitted as "full paper". However, it seems that the focus of the paper is confused between a "research paper" and a "dataset description paper"; different parts of the paper seem to be written from different perspectives.
Firts a short intro is presented. I found this a bit generic: the authors should formulate more clearly what problems they are addressing. If this is a "full paper", as submitted, then introduction to the automatic enrichment methods is completely missing. No references are made in this section for contextualizing the paper, which is a bit odd.
Then related works are discussed, focusing on general harmonizing data models used in museums. It is not said there how these models are actually related to the paper at this point, and how the paper will perhaps contribute to the state-of-the-art presented. This could be clarified. I would have expected references to also related works on representing textiles. For example, here is one I found by just googling "linked data textiles":
https://doi.org/10.1080/19322909.2017.1359135gle
Related work related to the methodological part of the paper (automatic image and text analysis) is missing.
Section 3 presents the SILKNOW ontology. I liked the idea of first collecting research questions from expected users, and outlining several usage scenarios, and then designing the model to match these.
The model presented in Section 3.1. is not described in sufficient detail. Reference to Gihub is ok, but the authors should explain the model in the paper in more detail, too, and explain and motivate the modelling choices made there, not only document the outcome. Motivation s for using CIDOC CRM in general is ok, but are not enough. Table 1 and Table 2 are not understandable to the reader. Figure 1 and Table 2 not even referred to in the text, and the tiny font of Figure 1 is not readable.
Section 3.2 presents a SKOS vocabulary for silk-related terms in several languages, which is a nice result of the paper.
Table 3 shows how many thesaurus terms were found in the museum collection data. It would be even more important to know, how many silk terms that are used in museums are not found in the thesaurus, as this will effect recall dramatically. Some tag clouds are presented later related to this, but the most frequent terms there are not related to silk. The tag cloud text fonts are also in many places too small to be readable. Some rigorous numerical data would be nice, too.
Section 4 considers building the knowledge graph. Table 4 telling what data has been aggregated from where and how much is not referred to or explained in the text. Same problem occurs with Fig. 7. "MET coding" is not explained in the caption.
Section 5 discusses enriching the data with some machine learning techniques. This section looks a bit odd in a knowledge graph description paper; It should be stated clearly what is the outcome from a knowledge graph perspective. On the other hand, as this paper was submitted as a full research paper, the text before it is rather content of a dataset description paper. Figure 9 is not understandable.
Summary
Originality. Focus on silk related knowledge is a novelty. Some methodological work is presented, but the work is partly still ongoing. Scientific contributions needs clarification even if some evaluations were shown, if this is a research paper.
Significance of the results. At the current state of the paper and project, the significance of the results needs clarifications; there is potential for this.
Quality of writing. More references would be needed here and there. For example, in the Introduction there are no references at all, OntoMe mentioned first without a reference etc. Figures and tables are not explained in the text in many places. The authors need to decide what type of paper this actually is. I would recommend splitting the paper into a separate dataset paper and a methodological paper on text and image analysis in the future.
See also SWJ criteria listed in the beginning of this review.
|