Review Comment:
I thank the authors for considering my feedback and addressing some of the mentioned aspects in their new version of the paper. In general the new revision of this paper only seems to contain minor changes and most of my original comments are still valid. The author's response and new version sufficiently address the questions 1-5, 9 and 12-14 in their response to reviewer 3. However, the other questions are not sufficiently addressed from my perspective. In the following I would like to elaborate on each question and the provided answer:
Question 6
The authors state that they collected the dataset in compliance with the terms of service applicable at that time from Dotdash Meredith. The terms of service changed in the meantime, which means that no updates from Allrecipes.com can be included in RecipeKG. Given that this is the only data source presented in the paper, the KG can not be updated in a straightforward way, i.e. following the presented pipeline. Personally, I would assess the likelihood of copyright-related issues a bit higher, especially given that the terms of service changed and already prohibit the authors from updating their KG, but I did not look into the mentioned precedent. For my review I will still consider this as a small risk for the long-term availability. The authors could provide actual lawsuits or legal expert opinions on this matter. Just the existence of other datasets with potential copyright-issues does not count as precedent in a legal sense.
Question 7
Section 3.3 now provides a better description of the modelling process in general. However, the added text provides insights on how blank nodes are used, but I am still not sure why they are used.
Question 8
I agree with the authors that outdated datasets can support research. But these datasets provide support despite being outdated. It is still a drawback of such a dataset. Providing a single example of an outdated dataset with significant attention does not change that.
Of course the publication of a paper describing a KG can improve the visibility, but for papers in this journal the usefulness of the dataset should be shown by corresponding third-party uses. The git repository does not show this kind of third-party uses.
Question 10
According to the authors the novelty of their work is modelling of health scores by using SWRL rules and the extraction of the Allrecipes.com category system. Both aspects are not described nor investigated in detail as part of this work.
Question 11
But how would such SWRL rules look like? Given that this is one of the claimed novelties of this paper the authors should focus on this aspect in more detail. A user of RecipeKG would need to write these rules, but the paper does not show how these SWRL rules are created. Especially, if the KG should be used by domain experts a really detailed description of this is necessary.
Question 12
I overlooked the mention of the data provenance in the paper. Of course this addresses my initial question. In order to really support such comparisons, it might be useful to model the data provenance in the KG, too.
Question 15
The extracted categories are defined by Allrecipes.com to support navigation and search on their website and not designed for any kind of research. The authors seem to just assume that any kind of category helps researchers. For me this claim remains unsupported.
Overall, the new version of the paper and dataset still lacks a clear use-case, is not maintained nor was the linking towards existing datasets in this area (mapping only 89 from 6309 ingredients without further explanation) improved. The description is improved to some extent, but the paper still does not go beyond defining a few rules and it still lacks any discussion on this. This work seems to be more suitable for a workshop or a short conference paper.
|