Review Comment:
This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.
This submission describes a public service for monitoring the health, discoverabilities, and other key features of public SPARQL endpoints. The service has been set up running for over a year’s time and it seems to have received reasonable interests from the community, with ~500 unique visitors per month. As far as the reviewer is aware, this is a unique service that provides comprehensive monitoring of public SPARQL endpoints.
The paper is generally well written and very easy to read. However, as a tool paper, the major issue of the paper is the lack of evaluations. The authors proposed a set of dimensions for monitoring SPARQL services, but how sufficient they are, for what kind of users, and what kind of needs. There was no UI screenshot in the submission, and the reviewer was unable to access the public service at the time of review. How easy is it to use the UI for target users, to find information that they need? Are these limitations of the current development? Finally, how robust and scalable are the APIs provided by SPARQLES? At the time of review, the system (as well as the one hosted at the alternative URL) was not accessible. So it naturally leads to the question regarding how robust the whole system is and what kind of mechanism there is or will be in place to ensure its stability? All these questions put the quality of the presented tool under doubts.
Another major issue with the submission is that it does not provide sufficient details as a tool/system paper. The reviewer found it hard to have a full grasp on how the system can be used or how the UI can be interacted. Are the example questions given in section 3 the sort of questions that a user expects to be supported by the system? The paper still reads a bit like a mixture of a research paper and a tool paper. The reviewer thinks that the content needs to be better balanced, and it may benefit from a restructure, particularly for section 3.
Apart from the major problems, the paper also has some minor issues:
1. If this is submitted as a tool or system paper, should there be a justification of where the list of features came from? Were they based on a survey on users’ need or an empirical study? The work could have been better motivated.
2. The topic of monitoring web services has been extensively studied by the Web service community. Although the authors can argue that some unique features of SPARQL services do require a new monitoring system, I think this could have been better described and argued in the paper. This could particularly impact on the range of dimensions considered by the system.
More detailed comments below:
1.Page2 (section 1) The reviewer thinks that each factor (like availability, discoverability) could benefit from a clear, explicit definition, to show the kind of computation used to measure each factor.
2. Page3 (section 3.1) Why was the availability of a service monitored at an hourly basis? Is it not too frequent and would it not pose too much stress on the storage?
3. Page3 (Section 3.1) why was a SPARQL query needed to test the availability of a service, not a simpler mechanism, like ping?
4. Page4 (Section 3.1) I found the purpose of the set of research questions used in each section 3.* a bit confusing. Where do these questions come from? Can users find answers to these questions through the current public system? Why would users not expect other sets of questions?
5. Section 2 and 3 need a bit restructuring. Part of section 3 also covers how the data was collected. Would this not be a better fit for describing the system implementation? Currently section 3 is a mixture of what your system managed to collect and what it can manage to analyse. It’s not just about Analytics, as what the section title suggests.
6. Page4 (section 3.2) I can understand why the authors chose VoID and SD as their monitoring target. However, should this part of work focus more on the discovery capability that is desired by target stakeholders rather than two specific vocabularies? What are the justifications for this approach? Again, a clear definition of discoverability could help.
7. Page7 (section 3.3) In terms of performance evaluation, have the authors considered throughput of services? Why not?
8.I found the description about the various interfaces provided by SPARQLES inadequate, both in terms of section 4.2 and 4.3. It is hardly possible to understand how to use the tool or UI with the current text in the manuscript. I think these two sections need to be much more expanded.
|
Comments
SPARQLES temporary downtime
We would like to let the reviewers know that due to some abnormal traffic/attacks involving SPARQLES, the host, OKFN, has unfortunately had to disable the service. We think that perhaps some external researchers were accessing the service in an impolite way, causing overloading on the OKFN servers. We're working hard to migrate the service and will report back once ready (hopefully early next week). We apologise for any inconvenience caused but this was truly an unforeseeable event on our end.
SPARQLES back online at a new home
We just wish to note that the SPARQLES system is back online at a new home: http://sparqles.ai.wu.ac.at/. The old URL permanently redirects to the new location.
We apologise sincerely for the downtime but it was something outside of our control. OKFN servers were hit by an Elasticsearch vulnerability [1] (in which we believe SPARQLES was not involved) and the admins decided it was best to cut external services. Hence we needed to organise a new server with sufficient memory and back-ups to comfortably host the service at short notice.
Finally, we want to note that the system may take a while to warm back up. We have no readings during the downtime hence, for example, current availability measures for specific endpoints will tend to be 0% or 100% until the system has been up for a few days.
We thank the reviewers for their time and apologise once again.
[1] http://bouk.co/blog/elasticsearch-rce/