|Review Comment: |
This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.
I would like to thank the authors for clear answers to almost all of
my questions and concerns! For me only two points remain. I believe
these should be easy to address by the authors in a minor revision:
- I was unable to find the definitions of terms like ‘cleaning’ and
‘preparing’, etc. in the revised paper.
- IIUC then the authors claim that quantitative evaluation of the
platform requires the construction of a benchmark for
cloud-services. While I do see the benefits of such a benchmark,
e.g., making it easy to compare systems with one another, I do not
believe that in absence of such a benchmark researchers should not
quantitatively evaluate the systems they build at all. For
instance, the number of concurrent users per node, the number and
size of transformations that is supported within a given time
unit, etc. are all things that can be measured without a
benchmark. The authors must have these, or very similar, numbers
because they are needed to configure the load balancing and other
properties of the cloud-hosted solution. It may also be possible
to mention how many users are currently using the system. This at
least gives an inkling of the viability of DataGraft as a
sufficiently scalable multi-user platform.
Thank you for clarifying the impact / external (re)use of the
DataGraft platform. IIUC the impact ATM mainly concerns reuse of
various software components that constitute DataGraft, but there are
also indications that the platform is being used outside of the
original development context.
The distinction between web- and cloud-hosted as well as the
distinction between cloud-based and cloud-hosted solutions was not so
clear to me before. This clarifies the delta WRT the LOD2 stack very
Thanks for clarifying the benefit of pure / side effect-less functions
for data transformations. I understand now some of the claims
BTW, I'm impressed by the ability of the DataGraft platform to
clean/transform the ‘volkstellingen’ CSV.