Review Comment:
(1) originality,
The paper addresses a relevant topic, automated link maintenance in the web of data and brings novelty to the problem by including automated checks for semantic drift although the basis or effectiveness of these checks are not thoroughly explored. It is unusual that no aspects of machine learning-based NLP or vector representations of semantics are mentioned in 2022.
(2) significance of the results,
The authors outline a general method for performing automated link maintenance but many aspects of the method are based on arbitrary thresholds, mean combinations of multiple scoring mechanisms and opaque similarity measures that lack empirical evidence or justification. The effectiveness of the system is not evaluated in a field trial with a real, diverse linkset but rather a single example is worked through. This is insufficient evidence to assess the effectiveness of the approach and no comparison is provided to similar methods, as even if they work through different mechanisms the overall relative effectiveness of the link maintenance tasks could be reported.
(3) quality of writing.
The paper is well written and generally clear. Some additional editing could improve the clarity of expression. structurally it is missing a significant evaluation section. The references seem appropriate, a couple of additional suggestions are provided below.
= Detailed Comments =
==Introduction
You say "Semantically broken links appear when the semantics of associated resources does not further express the meaning intended by the triple’s author."
This is not exactly clear, it is not precisely defined whether you mean the semantics of the link or the alignment of the semantics of the subject and object. Suggest you re-phrase to clarify.
==Related Work
You say "they [past work] do not focus on how to recover existing broken links."
The SUMMR mapping maintenance framework does address link repair, see
MEEHAN, ALAN, The SPARQL usage for mapping maintenance and reuse methodology, Trinity College Dublin.School of Computer Science & Statistics.COMPUTER SYSTEMS, 2017
http://www.tara.tcd.ie/handle/2262/81715
and
Meehan, A., Kontokostas, D., Freudenberg, M., Brennan, R., O’Sullivan, D. (2016). Validating Interlinks Between Linked Data Datasets with the SUMMR Methodology. In: , et al. On the Move to Meaningful Internet Systems: OTM 2016 Conferences. OTM 2016. Lecture Notes in Computer Science(), vol 10033. Springer, Cham. https://doi.org/10.1007/978-3-319-48472-3_39
https://link.springer.com/chapter/10.1007/978-3-319-48472-3_39
==Sec 3.1=
It would be good to explain how the linkset is determined for a dataset.
==Sec 3.2=
It would be useful to discuss how applicable in practice is your assumption that only 1 dataset evolves at a time.
==Sec 3.5.1
Top k Candidates
Why do you use a simple mean of the comparison types? Do you have any evidence this is the most appropriate?
==3.5.2. Maintenance Actions
Do you have any evidence for how changing Beta from 0.5 changes system performance/behaviour?
In terms of generating all these candidate actions, do you have any evidence for how practical this is for large datasets? This is important because, as you say, automation is key when dealing with huge datasets but if the overhead is too high then the process is not likely to be practical.
=4=
This needs a detailed study of the performance of the system with a whole set of complex links, between multiple datasets. For example Meehan examined
1,673,634 interlink category mappings of the v.2015-10 DBpedia release
=5=
You say "It should also be noted that there are no ready-to-use gold standard datasets for comparative analyses in the specific context of this study."
Please see SUMMR mapping maintenance framework evaluations based on maintaining the whole link set of DBpedia releases.
You say "In addition, existing proposals found in the literature do not deal with the same issue investigated in our framework, which made side-by-side quantitative and statistical comparisons based on objective metrics with other studies impossible at the time."
It is possible to compare the effectiveness of different approaches in terms of the precision and recall of their ability to maintain a link-set, no matter what the mechanism.
You say "We assume that, in our framework, the quality of the results is correlated with the right choice of background knowledge, which is used in the task of finding suitable candidates to replace the broken part of the link."
It would be very useful to quantify this assumption by experiment, even an initial experiment to see the difference between the 2 semantic similarity measures you use.
= Typos and Grammar Suggestions =
==Abstract
maintenance of RDF links consistency -> maintenance of RDF link_ consistency
LODMF - expand
operations on RDF triples deals with affected links to other repositories
- > not clear what you mean from "deals", is there a word missing?
users assistance - > users_'_ assistance
==Introduction
over time a challenging task -> over time _into_ a challenging task
Existing literature on link maintenance -> _The e_xisting literature on link maintenance
|