Review Comment:
The paper investigates how to generate spatio-temporal transport data w.r.t. population-distribution data. Data generation is an important task for fostering reproducible benchmarking and empirical research. The work focuses on generating public transport data encoded in RDF. The authors' choice of using RDF and Linked Data best practices, follows the intuition that transport data usually refer to shared entities.
The main paper contribution is an algorithm for public transport data generation and its implementation. The authors built on the state-of-the-art of public transport planning, designing an approach that creates a geospatial region, places stops, edges and routes, and finally schedule the trips.
The authors claim the state-of-the-art lacked a realistic dataset generator. Therefore, they employ many techniques to asses the generated data resample realistic scenarios.
The paper writing meets the SWJ standards. However, there are few passages, which require further clarification.
The work presents a significant engineering effort yet it does not highlight all the scientific value it creates. The architectural structure of the generators is as important as the algorithm. Indeed, it might inspire intuitions about the approach scalability, and clarify the design. Moreover, requirement analysis is surprisingly missing, which would have also helped to drive the evaluation. As far as I understood, there was one which was removed after the previous round of review. The authors should consider re-adding it, maybe in a different form.
To this extent, both Jim Gray and Karl Huppler provide good sets of principles to support the design of domain-specific benchmarks. Moreover, it is essential that the authors clarify which task will test whoever is going to use the generator, e.g., query answering.
My most significant concerns regard the work motivation as well as the evaluation.
Regarding the former, as pointed out by another reviewer (reported in the author letter), it is not enough to sustain the lack of such a data generator in the state-of-the-art. A benchmark becomes obsolete when it is not able to distinguish between different approaches that adopt it, i.e., all the solution look good. Given that a benchmark consists of a data, one or more task and a set of KPI to compare. We can upgrade a benchmark by tuning either of them.
Is that true that existing spatio-temporal benchmark, and data generator, are obsolete? Moreover, which tasks are the PODIGG uses going to test?
Regarding the latter, it is not convincing the use of Duan et al.'s coherence metric to assess whether a given dataset is realistic. Indeed, Duan et al. define the coherence metric to highlight the structural differences between synthetic RDF datasets and real ones. High structuredness causes poor evaluation for RDF Stores because it makes the result less relevant in practice.
Unfortunately, it is easier to identify a characteristic that makes a dataset a bad candidate for benchmarking. On the other hand,
claiming the opposite requires a more complex study. What authors did going more in-depth in the comparison is in the right direction.
Nevertheless, they did not fully identify which characteristics of the real datasets make them relevant samples to study. This point raises again the problem of better positioning the work in the state-of-the-art, which requires 1) identify what tasks to solve over the data generated, and 2) survey existing solutions to inquiry if they are able or not of reaching the level of observability that PODIGG enables.
Summarizing
PROS
- the designed algorithm follows best practices
- the implemented tool is highly configurable
- the intuition about dataset "distance" goes in the right direction
CONS
- Motivation and Comparison but be improved
- Tasks are essential elements of a benchmark (e.g., queries)
- Evaluation is not convincing because of 1) coherent metrics usage
2) lack of term of comparison, i.e., what can one benchmark using PODIGG that she/he could not benchmark before.
|