Review Comment:
(0) Importance of the covered material to the Broader Semantic Web Community.
I think the aim of this study is important for the Semantic Web Community
(1) Suitability as introductory text, targeted as researchers, PhD students, or practitioners.
I find this to be overall correct. Could benefit from the improvements on the other aspects proposed below.
(2) How comprehensible and balanced the presentation coverage is
-In section 3, it is mention that in [13] "linked data publishing methodologies are elicited, mostly from the government domain. However, [13]'s is about publishing drug data. Unsure if this is just a typo, but it does not allow me to assess how the authors position against this previous work.
- In section 2.2, it is mentioned that Linked data extends the concept of open data. This is arguable, as Linked Data can exist without open data and viceversa. I believe that for the context of this work, it is Linked Open Data and Linked Open Government Data what is more important. For the latter, I miss some specific references about what is it, why is it important and what has been done about it asides the methodologies that will be studied later. Papers like this one might be useful to strength the
[1] https://www.scitepress.org/papers/2014/51433/51433.pdf
[2] https://ieeexplore.ieee.org/abstract/document/6547636/
My main remarks are on the methodology section, mostly related to how comprehensive the survey is.
- I am missing the motivation behind the proposed research questions. How the answer to each of them helps to "describe what should be addressed to a better outcome of LOGD policies". I also note the use of the word "policies"? Why not other methodologies? What about tools and technical matters? RQ4 seems the one a bit odd
- I tried to reproduce the queries made available by the authors on the separated online spreadsheet on 30 January 2020. I had some issues with them:
ACM DL: returned no results
IEEE Explore: OK
Scopus: 3 results instead of 39 reported
Science Direct: Query appears incomplete with respect to the others
ISI web of Knowledge: returns error "Invalid use of boolean operator"
Springer Link: A bit more results, I assume the extra papers were indexed in the second half of 2019 (after the query date reported on the paper), so I assume OK
This needs to be clarified on a revised version.
A minor question is why Google Scholar was not considered?
- I consider very important that the authors make available the full list of (469-18) papers they excluded and the corresponding reason for exclusion. This is important for reproducibility, and will also help reviewing (and future readers), as it will be easier to understand why a certain paper that I think should have been included is missing.
- Minor thing on the inclusion criteria, please clarify if you had institutional access to the listed search engines, and at what level of subscription. This to quantify how paywalling affects your study.
- On exclusion criteria, why to have "focus on the application of LD in a specific domain"? Does this means that papers that focus on a specific subset of open government data are ruled out? As an example, the following paper seems relevant but is not included:
[1] https://dl.acm.org/doi/abs/10.1145/2740908.2742133
I suspect this one in particular was filtered out due to the abstract not including the word "government" ("public administration" instead), but my general comment remains.
- It is unclear how reference [30] was used as a "control"? Does this mean that you checked the papers there also appear on your list?
- For a survey paper, I think the results and discussion section lack some content. For example, for RQ3, only a small subset of the steps is discussed. For RQ1, I miss a discussion about steps that are considered in only one or two papers: are they needed in general? are they needed in certain contexts?
(3) Readability and clarity of the presentation.
- The results section per RQ is hard to read. I think there is enough space to make each description much more structured.
- What does it mean for a row to be empty on Figure 4?
- I missed how the process model of Figure 5 was developed. I assume there was a methodology and some work to derive it, so I was a bit surprised that authors have not elaborated on this.
Overall, readability needs to be improved, there are many places where the article could benefit from proof-reading to increase clarity.
Other minor comments:
- p5. "The study is from a peer-reviewed vehicle" -> probably "article" was meant
- Use of the word artifact on table 3, waht is an artifact?
- Figure 5 has too low resolution.
- p10. there are academic references describing the deployment of UK and US portals
- I think the Linked Data book by Bizer is a better reference for the Linked Data principles than the original website by Berners-Lee
In summary, what I would expect for a major revision is:
- Clarify issues with queries
- Make available dataset of excluded papers and reason why
- Clarify the specific domain exclusion criteria and why it was chosen
- Improve structure, and therefore readability of results and discussion section
- Improve results/discussion around tools, and what are the implications of
- Expand on methodology for builiding the process model on Figure 5, which I think is a very valuable contribution and probably deserves a section on its own
|