Review Comment:
1. There is a substantial repetition between the abstract and introduction sections that detracts from the paper's engaging introduction (Page 1, line 40 until the end of Page 1).
2. A LaTeX error is observed on Page 2, line 30.
3. Although the presented work is an extension, the contribution of the authors compared to previous research is not adequately explained. It would be beneficial to briefly mention the previous architecture or provide a quantifiable measure to demonstrate the improvement achieved by the proposed approach.
3. Please review Page 3, line 11, as Figure 2.2 seems to be referenced incorrectly. Presumably, Figure 2.b is intended.
4. On Page 3, line 14, the term "internal representations" lacks clarification. It would be helpful to provide an explanation of what these representations entail.
5. For Figure 2.b, there is a reference to a numbering system, but the graph does not display any numbers.
6. The motivation for this paper appears to be the interest of the RDBMS community in machine learning within DBMS. It would be beneficial to provide an example in the introduction, illustrating a query that is challenging to predict accurately using traditional methods, while the proposed approach may offer improved performance.
7. On Page 3, lines 19-20, the meaning of "the nodes to send the query" is unclear. Please provide further clarification.
8. The example provided on Page 3, lines 26-36, requires additional elaboration. The focus is primarily on birthplace and birth date, along with triple patterns, rdf:type, filter operators, and the optional operator. Please provide a more detailed explanation of this example.
9. There seems to be a discrepancy between the statement on Page 3, line 3, which mentions a focus on SELECT join, and the example, which contains the aforementioned operators. This inconsistency is confusing.
10. On Page 3, line 35, it would be helpful to provide the full sentence or context regarding the mentioned SPARQL query.
11. Please explain the rationale behind choosing K-medoids instead of other clustering algorithms. Clarify this point on Page 7.
12. On Page 7, line 2, there is a capitalization error in the phrase "In the next Section."
13. The reference to Figure ?? on Page 7, line 21, requires correction to accurately refer to the figure in question.
14. Please explain the meaning of VaR_UrI_var or Var_UR_URI. Additionally, ensure that any abbreviations are properly introduced and defined for readers who may not be familiar with them.
15. Provide a clear definition of "bounded variable" as mentioned on Page 7, lines 45-47.
16. The sentence "The query engine by first select" on Page 7, line 43, needs revision for clarity and accuracy.
17. On Page 7, line 49, it is unclear which predicates have lower selectivity. Consider revising the statement for clarity. It may be necessary to clarify if it refers to high or low selectivity.
18. Please provide a detailed explanation of how node semantic information is computed, as mentioned on Page 7, line 30.
19. Prior to utilizing $P^(i)$ and $P^(f)$, it would be helpful to define these variables in the preliminary section, as mentioned on Page 8, line 2.
20. The meaning of numbers from c1 to c4 in Figure 3 requires explanation.
21. Figure 4 appears to be similar to Figure 5 in Markus's paper, with slight differences in colors and additional detailed shapes. Provide a clearer figure that emphasizes the distinctions from the original paper (Neo:Neo: A Learned Query Optimizer).
22. While it is apparent that the authors aim to introduce learned query optimizers to the graph database realm, the paper lacks significant contributions. Can you add a discussion section that explains and address the challenges faced in applying learned optimizers to SPARQL compared to structured databases?
23. Section 4.2.1 resembles the corresponding section in the original paper (Neo) and it was not easy for me to comprehend and I went to Neo paper to understand the proposed architecture. This section needs to be more concise and clear.
24. In Section 4.2.3, the authors mention the use of autoencoders for dimensionality reduction, which represents a novel aspect compared to the Neo paper. However, the authors do not explain their design choices in this regard. Additionally, it would be beneficial to cite papers that have employed autoencoders for dimensionality reduction.
25. Provide an explanation of the connection and order between Figure 4 and Figure 5.
26. The architecture appears to be the same as Neo, with the addition of an autoencoder to capture essential features of the numerous predicates in graph data. Is this understanding correct?
27. On Page 10, line 47, the mention of Bonifati et al.'s analysis of query logs from Wikidata and the prevalence of relatively simple queries consisting mostly of single triple patterns requires clarification. How did this reality impact the training process and the results of the proposed model? Have you tried different logs maybe synthetic ones?
28. Page 10, line 47, lacks clarity regarding the number and nature of queries involved in the model training. Provide statistical information about the query features and their properties for a better understanding. How did you make the selection in detail?
29. The GitHub link and Huggingface platform provided in the paper do not contain the queries utilized for training. It is recommended to make the queries available and easily accessible.
30. The paper lacks an analysis of how the learned query optimizer compares to traditional query optimizers in terms of performance. A comprehensive discussion and comparison with traditional query optimizers are necessary.
31. Additional experiments are required to demonstrate the quality of the optimizer with queries of varying shapes and features.
Reference:
1)Angela Bonifati, Wim Martens, and Thomas Timm. 2017. An analytical study
of large SPARQL query logs. Proceedings of the VLDB Endowment 11, 2 (2017),
149–161.
Review Summary:
---
Based on the comments provided, it is clear that the paper requires a major revision before it can be considered for acceptance. We have raised several important issues regarding clarity, organization, justification of contributions, and missing information. In order to address these concerns, the authors should consider the following:
1) Improve Abstract and Introduction: Remove repetition, ensure an engaging introduction, and provide a clear example illustrating the challenges of traditional query prediction.
2) Clarify and Elaborate: Address unclear terminology, explain the meaning of numbers in figures, provide additional clarification for examples and sentences, and define abbreviations.
3) Correct Errors and References: Fix LaTeX errors, correct figure references, and add missing numbering to Figure 2.b.
Explain Contribution and Rationale: Clearly explain the authors' contribution compared to previous research, quantify the improvement achieved, and provide the rationale for choosing specific algorithms.
4) Provide Additional Information: Elaborate on computation processes, clarify unclear statements, define variables, and explain the impact of query logs on training.
5) Conduct Comprehensive Analysis: Include a performance comparison with traditional query optimizers, conduct additional experiments with varying query features, and add a discussion section addressing challenges in applying learned optimizers to SPARQL.
6) Address Similarity to Neo Paper: The paper has a high similarity to the Neo paper, including the approach, architecture, and methodology with slight modifications to apply the approach in SPARQL rather than SQL. The authors should clearly highlight the distinctions and unique aspects of their work compared to Neo. It is important to avoid appearing as a mere replication of Neo and emphasize the novel contributions and advancements made in this paper. By addressing this concern, the paper can establish its own identity and demonstrate its originality and value beyond being perceived as a replication of the Neo paper. To summarize, In many parts, the paper seems incremental on top of the original Neo paper (see comments 21.,23.,26). The authors should justify more convincingly, how and why their work goes beyond a straight application of the Neo approach, worthwhile a journal publication, and what are different/additional challenges appearing in the RDF context, that do not apply to the original RDBMS setting.
|