Review Comment:
This work presents a domain-specific workflow of semantic enrichment applying wikidata. The semantic enrichment is performed on SDBM, an open-access RDF structured data of premodern manuscripts. Relying on the specific id, VIAF, and OpenRefine, the authors detect the entities in SDBM in wikidata and enrich the corresponding data in wikidata by appending the SDBM ids to the aligned entities in wikidata using a dedicated new property.
Strengths: (3) quality of writing
The paper is very well written considering the following details:
1. The background knowledge is well covered, making it easy for a reader with a different background to get on board.
2. Provides an extensive overview of the related works
4. The approach and the contributions are well organized and precisely explained.
Weaknesses: (1) originality, (2) significance of the results
(1) The contribution of the paper involves the application of the existing tool, i.e., OpenRefine, and relies on the availability of an existing ID, i.e., VIAF, in order to align the entities between SDBM and wikidata. Considering the fact that one of the main obstacles in entity alignment is the lack of expected/unified identifier in one of the compared data/knowldge bases, I see the contribution not generic enough to fall into the category of "research paper"s.
(2) The authors demonstrate the significance of the achieved result, i.e., added links, in terms of a set of SPARQL queries which, personally appreciate very much. The queries are also provided clearly and are. accessible. However, considering that the paper is a research paper and not an "application", nor a "tool" report paper, it is expected to have a stronger experimental study section addressing a set of concrete research questions. For instance, experiments that target investigation of: a) how the added links improve the connectivity of the nodes (entities) in the graph considering network analysis parameters? b) what is the accuracy/precision of the aligned entities? (how do you assess them?).
Furthermore, there are statements regarding the selection of the strategy for the entity alignment process in section 4.3 that are worth more explanation. For instance, in "the most successful strategy to secure automatic matches was to use the SDBM name, the corresponding VIAF ID recorded in the SDBM, and the Wikidata item type. Any additional information did not improve matching and ranking", what is the definition of "successful" here? is it an exact match? have they applied any statistical evaluation? Also, it would be great to add some examples of information that is considered to fail to improve.
Overall, I see the contribution of this paper as very valuable and significant for the community and specifically the domain knowledge engineers and can consider the paper a good candidate for the "Application reports" or "Reports on tools and systems" tracks. As a candidate for "Full papers", I recommend the authors enhance their empirical studies.
|