Linked Open Images: Visual Similarity for the Semantic Web

Tracking #: 2762-3976

Authors: 
Lukas Klic

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Application Report
Abstract: 
This paper presents ArtVision, a Semantic Web application that integrates computer vision APIs with the ResearchSpace platform, allowing for the matching of similar artworks and photographs across cultural heritage image collections. The field of Digital Art History stands to benefit a great deal from computer vision, as numerous projects have already made good progress in tackling issues of visual similarity, artwork classification, style detection, gesture analysis, among others. Pharos, the International Consortium of Photo Archives, is building its platform using the ResearchSpace knowledge system, an open-source semantic web platform that allows heritage institutions to publish and enrich collections as Linked Open Data through the CIDOC-CRM, and other ontologies. Using the images and artwork data of Pharos collections, this paper outlines the methodologies used to integrate visual similarity data from a number of computer vision APIs, allowing users to discover similar artworks and generate canonical URIs for each artwork.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 26/Apr/2021
Suggestion:
Major Revision
Review Comment:

This paper presents a novel platform to match similar artworks across open digital image libraries, through a combination of Semantic Web technologies and Computer Vision methods. As such, the proposed application contributes to the broader research area of Cultural Heritage, which is becoming increasingly popular within the Semantic Web community. Moreover, the qualitative comparison of off-the-shelf CV APIs contributed in Section 3 is likely to impact many experts in the field who can benefit from a comparison of the image matching features offered with existing platforms.

Overall, the motivation of this work is clear and substantiated. However, further changes are needed to improve the clarity and organisation of this paper, as suggested in the following.

One of the main arguments underlying this work is the fact that textual descriptions of images are biased, calling for autonomous (CV-based) methodologies to objectively compare images by visual similarity (page 2, Section 2). However, this claim overlooks the fact that CV methods are also inherently biased. Thus, it should be toned down and opportunely rephrased. By contrast, the requirement of solving inconsistencies in the image metadata is stronger and more compelling, although it is not discussed until the conclusion Section.

In general, a more technical language should be favoured when describing the Computer Vision elements of this work. For example, in Section 3, references are made to “a more “fuzzy” similarity matching”, but further technical details should be provided on the type of similarity metrics under comparison. Similarly, the expressions “allowing images to dialogue with one another” (Section 2) and “Google search image type lookup” (Section 7), are overly informal and ambiguous. One option would be to define all the relevant terminology in the Background Section. This addition would also help to contextualise the role of the similarity ontology of Section 5.

The last paragraph of Section 2 very usefully summarises the motivation and intended contribution of this work so I suggest it is moved earlier in the text, to the Introduction Section.

The results in Table 1 and the conclusions drawn in Section 3 about the most suitable API seem to contradict one another: from Table 1, one would gain the impression that methods such as Inception V3 are preferable because they cover the highest number of evaluated features. On the contrary, the text illustrates why Inception V3 was the least useful APIs among the tested ones. Table 1 should be revised to (i) convey how the different features/attributes are weighted/prioritised in the evaluation, (ii) include the metrics of robustness to the angle, colour, and crop variations which are only described in the text.
A precise description of the data sample used to manually inspect the matching results should be provided as well.

A more detailed explanation of the building blocks of Figure 2 should be added, to make the paper more self-contained and more accessible to readers who are unfamiliar with the Similarity Ontology.

The author should also clarify in which context bi-directional and non-bidirectional similarity metrics are used. In section 5, it is mentioned that, in the data model, “ similarity is always bidirectional and a search for one image in pair should generally yield the same score as searching for the other”. However, in Section 8, specific cases are discussed where an asymmetric indicator of similarity is preferred, to differentiate between a copy and the original, for instance.

In sum, this paper presents an interesting and timely project, which has the potential to impact many use-case scenarios. Thus, the author should also discuss any future evaluation plans with respect to assessing the utility of the proposed platform to expert and non-expert users, i.e., across the many described use-cases.

Review #2
By Ronald Siebes submitted on 14/Jul/2021
Suggestion:
Minor Revision
Review Comment:

(1) Quality, importance, and impact of the described application (convincing evidence must be provided).

The quality, importance and impact of the work described in the paper is more than sufficient for this journal.
Firstly, it connects two fields, computer vision and Linked Data in such a way that both for end users (e.g. art historians) and IT specialists a lot of effort is put in making it intuitive and ease to use/extend.
Secondly, the modular architecture allows to easily incorporate other similarity algorithms and rdf templates and vocabularies.
Thirdly, by having RESTful and SPARQL access to the graph DB containing the similarity information, integration with relevant projects like Europeana becomes a relatively easy thing to do.
As the authors mention in the discussion, next to research in Art (History), a much wider realm of applications are possible. E.g. plagiarism, copy-right detection, meme origin tracing, etc will have great benefit from the research described in this paper

(2) Clarity and readability of the describing paper, which shall convey to the reader the key ideas regarding the application of Semantic Web technologies in the application.
Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete.

The paper has a good structure, and easy to read. The code on GitHub looks mature, but
1: the GitHub 'readme' files and instructions for the website (https://vision.artresearch.net/resource/start) are too minimal.
2: the publicly available SPARQL endpoint and the REST API are missing in the paper.
These two points make it a 'minor revision' instead of accept.

Review #3
Anonymous submitted on 22/Jul/2021
Suggestion:
Major Revision
Review Comment:

The paper focuses on the tool ArtVision which allows the matching of similar artworks in the cultural heritage image collection. It uses the repository of Pharos for this purpose.

In section 1 the author introduces the problem but never gives a reference in the first two lines of the introduction.

Moreover, the explanation about the funding of the project is also given in the introduction which should actually be provided in the acknowledgment section (line 36-38).

Section 2, lists down the existing work but the comparison to their own work is very shallow. For example, references 3 and 4 would be very close and much more semantically enriched since they are recognizing the style of the painting. I would like to know what else is the added advantage of this API since the authors say that the data is openly available but it is behind a "login wall". Or this API can further take off-the-shelf algorithms in style recognition and add them to their API. Would it be possible for the authors to provide the dataset on Zenodo or another such platform?

There is a login functionality on the website, does this allow to perform SPARQL queries over the collection of the dataset? Since there is no log-in available for the reviewers, it is hard to assess this.

Section 3 starts with the following sentence:

"In order to develop the ArtVision platform, the author tested a range of publicly available Computer Vision and Machine Learning..." where "the author" seems to be a weird construction of the sentence.

Section 4 discusses many engineering details which may not be necessary to report in a journal paper such as the timeout issues.

As described before also, section 7 talks about Federated SPARQL queries but unfortunately I do not see the link on the website and can not test the query.