The SILKNOW Knowledge Graph

Tracking #: 2776-3990

Authors: 
Thomas Schleider
Raphael Troncy
Mar Gaitan1
Jorge Sebastian
Dunja Mladenic
Avgustin Kastelic
Besher Massri
Arabella Leon
Marie Puren
Pierre Vernus
Dominic Clermont
Franz Rottensteiner
Maurizio Vitella
Georgia Lo Cicero

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Abstract: 
SILKNOW is a research project that aims at improving the understanding, conservation and dissemination of the European silk heritage from the 15th to the 19th century. This paper presents the SILKNOW knowledge graph (KG) that lies at the center of the application of Semantic Web technologies and computing research to the needs of museums and every other user of this knowledge. The underlying data model is based on CIDOC-CRM and data mappings which are realised and implemented with conversion tools developed for SILKNOW. The full integration pipeline consists also of our own crawling software to retrieve the original data from both public sources and project partners. We developed an API access for the KG and created the exploratory search engine ADASilk on top of it. Finally, we present how we apply automatic image and text analysis to predict missing metadata in the knowledge graph.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/Jul/2021
Suggestion:
Reject
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

The paper describes a knowledge graph related to silk textiles in museums. In the end of the paper the authors apply automatic image and text analysis for data enrichment. The paper was submitted as "full paper". However, it seems that the focus of the paper is confused between a "research paper" and a "dataset description paper"; different parts of the paper seem to be written from different perspectives.

Firts a short intro is presented. I found this a bit generic: the authors should formulate more clearly what problems they are addressing. If this is a "full paper", as submitted, then introduction to the automatic enrichment methods is completely missing. No references are made in this section for contextualizing the paper, which is a bit odd.

Then related works are discussed, focusing on general harmonizing data models used in museums. It is not said there how these models are actually related to the paper at this point, and how the paper will perhaps contribute to the state-of-the-art presented. This could be clarified. I would have expected references to also related works on representing textiles. For example, here is one I found by just googling "linked data textiles":

https://doi.org/10.1080/19322909.2017.1359135gle

Related work related to the methodological part of the paper (automatic image and text analysis) is missing.

Section 3 presents the SILKNOW ontology. I liked the idea of first collecting research questions from expected users, and outlining several usage scenarios, and then designing the model to match these.

The model presented in Section 3.1. is not described in sufficient detail. Reference to Gihub is ok, but the authors should explain the model in the paper in more detail, too, and explain and motivate the modelling choices made there, not only document the outcome. Motivation s for using CIDOC CRM in general is ok, but are not enough. Table 1 and Table 2 are not understandable to the reader. Figure 1 and Table 2 not even referred to in the text, and the tiny font of Figure 1 is not readable.

Section 3.2 presents a SKOS vocabulary for silk-related terms in several languages, which is a nice result of the paper.

Table 3 shows how many thesaurus terms were found in the museum collection data. It would be even more important to know, how many silk terms that are used in museums are not found in the thesaurus, as this will effect recall dramatically. Some tag clouds are presented later related to this, but the most frequent terms there are not related to silk. The tag cloud text fonts are also in many places too small to be readable. Some rigorous numerical data would be nice, too.

Section 4 considers building the knowledge graph. Table 4 telling what data has been aggregated from where and how much is not referred to or explained in the text. Same problem occurs with Fig. 7. "MET coding" is not explained in the caption.

Section 5 discusses enriching the data with some machine learning techniques. This section looks a bit odd in a knowledge graph description paper; It should be stated clearly what is the outcome from a knowledge graph perspective. On the other hand, as this paper was submitted as a full research paper, the text before it is rather content of a dataset description paper. Figure 9 is not understandable.

Summary

Originality. Focus on silk related knowledge is a novelty. Some methodological work is presented, but the work is partly still ongoing. Scientific contributions needs clarification even if some evaluations were shown, if this is a research paper.

Significance of the results. At the current state of the paper and project, the significance of the results needs clarifications; there is potential for this.

Quality of writing. More references would be needed here and there. For example, in the Introduction there are no references at all, OntoMe mentioned first without a reference etc. Figures and tables are not explained in the text in many places. The authors need to decide what type of paper this actually is. I would recommend splitting the paper into a separate dataset paper and a methodological paper on text and image analysis in the future.

See also SWJ criteria listed in the beginning of this review.

Review #2
Anonymous submitted on 22/Jul/2021
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

The authors present a SILKNOW Knowledge Graph which focuses on enabling Knowledge related to European Silk heritage from the 15-19 century available and interoperable.

In the introduction, there should be a reference about lines 48-49 as well as for Google Cultural Institute and Europeana.

In the second section related work is simply dumped and never compared to the work of the authors. This slightly undermines the contribution of the paper.

The competency questions are very clearly described.

The data model is based on CIDOC-CRM however, there is a link to the GitHub page but the data model is not actually discussed in the paper. How this ontology was populated is also not discussed.

Section 5 discusses the prediction of missing meta-data which is actually a classification approach. The authors use off-the-shelf algorithms for classification such as CNN, however, the justification of the chosen methodology is not provided. Why do authors need Multi-task learning? This is not clearly motivated. How much of the training data was available or did the authors use a pre-trained model and perform transfer learning. The description of image classification is really very vague.

The same is the case with textual information where authors randomly choose Linear SVM, Random FOrest, etc. Why not directly classify using BiLSTM? Again the choice of the models should be justified.

It should really be emphasized here that the authors submitted this paper as a full paper yet I fail to see a research contribution here. It is an introduction to the resource and then selection of off-the-shelf algorithms for achieving a particular task.

Review #3
By Alessio Antonini submitted on 05/Oct/2021
Suggestion:
Major Revision
Review Comment:

"The SILKNOW Knowledge Graph" presents the construction of a thesaurus, knowledge graph and enrichment pipeline for textile data.

Overall, the contribution presents an impressive engineering work. The paper reads well, and it is competently structured around the different steps and components of SILKNOW.

My main objection about a publication "as it is" concerns two critical weaknesses of the current formulation.

Firstly, the authors do not address the textile domain. As a reader, I am left with no new understanding of its specificities, e.g., the textile properties, how they had been represented in the "custom" representations, the ontological assumptions behind these different representations, etc.

Secondly, the authors argued about this work addressing multiple stakeholders (I appreciated the scenario description). However, the paper does not further address scenarios or stakeholder needs. In my opinion, SILKNOW should be evaluated in light of the presented scenarios. For instance, the presented results about the CNN methods for feature recognition from textile images are hard to evaluate. While authors claim improved results (and room for further improvement), it is not clear if these results are good enough, hence, if the proposed method fits the scenarios.

The paper is missing a discussion section. While the amount of work and complexity is well reflected, the paper does not report on the process and knowledge acquired along the way. The only detailed explanation concerns the use of CNN for image feature extraction (which can be arguably considered a novel approach).

I encourage the authors to share their insights, highlighting their decision points and discussing the limits of SILKNOW. How is the textile domain different? What is it comparable to? Are there specific challenges between the different approaches in the material study of textile that can be found in the different modelling takes that you aim to bridge?

Here is a list of topics worth I hoped to see addressed:

a) CNN results should be discussed in light of scenarios.

b) Differences between models for textile features should be briefly described in a background section (not why there are differences but why and how these differences are reflected in the data and if these differences are an issue in comparing/combining datasets). If relevant, the highlighted issues should also be used to discuss the construction of the thesaurus.

c) Discuss the inductive/deductive approach adopted in the construction of the thesaurus. E.g., why do you identify but not construct abductive generalisations?

d) Discuss the use of the scientific observation in the context of qualitative analysis of textiles -> what are the different qualitative frameworks used by other repositories?

e)About the CNN architecture, why you did not split the tasks in the first place? Can you explain the hypothesis behind it? Did you hypothesise about causations or strong correlations concerning the manufacture or style that you were hoping to catch with your modelling approach?

I believe that these additions will make it an outstanding contribution to both the special issue and SWJ.

Of minor note:
Figure 1 should be of double-column length for readability
About the dataset:

Review #4
Anonymous submitted on 29/Oct/2021
Suggestion:
Major Revision
Review Comment:

The paper presents the SILKNOW ontology and its applications. The work itself is interesting and promising. The authors have presented a novel ontology, data model, and applications that allows the user to dive into historical collections about silk. To describe and present the work; this article consists of sections describing the datamodel, creating and enriching the dataset, and building semantic applications to explore the data. The work is presented after the introduction by detailing the related work, data model, data transformation, applications, enrichments, and lastly conclusions.

The related work is mostly presented in the related work section but also bits and pieces are included in the corresponding sections. However, it seemed that the there could have been more related articles and work presented to give context and to highlight the contributions of this work. For example, the work seems to be largely inspired by CultureSampo but there seems to be more recent developments and Sampos (e.g., FindSampo) that present cultural heritage objects as Linked Data. There was also very little related work regarding to metadata predictions or enriching. I would recommend adding more related work to better contextualize the work and its contributions.

After related work, the data model is presented. The design decisions used to create the model are clearly stated and described. However, in the descriptions of the used ontologies, the correct written form should be checked (e.g., schema.org, Ontome). Also, I would recommend adding URLs to footnotes for external resources, such as vocabularies, homepages of institutions or organizations, tools. It would have been easier to access them to read about them asses their usability here.

The building and publishing of the SILKNOW ontology and created applications are described in the next chapter. The processes for building the ontology were clearly described. It is logically followed by the publishing of the data and finally the application side. This chapter is then followed by enriching and validating of the dataset. I would recommend to check if the authors could change the ordering of the chapters here. To me it would be more logical if the order of the sections would be first to building, enriching, validating, and then publishing of the dataset and presenting the applications. Currently, the section about enriching and validating was a little surprising considering that the applications had been presented already.

The last chapter of the document is the conclusions that also discusses the future work related to the project. The section summarizes the key contributions of the paper, however, it would be interesting if the authors could reflect how the modelling choices impacted the application building. Also, comparison to CultureSampo or other to learn about the differences in the applications and modelling choices would be an interesting addition.

Overall, I think the submission was well written and presents interesting work, and I can’t find any other issues, hence I would suggest its acceptance with the suggested modifications above.