Review Comment:
I would like to thanks the authors for addressing most of my previous comments.
While the paper further improved, there are still some comments which should be addressed to avoid confusion:
Section 5:
I found it slightly confusing to introduce the independent variables, discuss their influence on dependent variables and then introducing/mentioning the dependent variables. I would suggest to restructure the Section 5.
p.9, 3 paragraph (right size), I do not understand why "[...] the dataset size cannot be fully explored in existing SPARQL query federation" benchmarks? Please provide more explanation.
Section 5 and 6 contains some contradiction regarding the setup.
Section 5 states that there were no limits on answer size and query execution, while section 6.1 states that virtuoso was set up with a maximum rows of 100,000 and a maximum query execution time of 60 seconds.
Section 6:
The server description in the text and table 5 do not really match up and is still very confusing.
Also, and more critically, the servers have different specs which might influence the experiment (e.g. server time outs which cause zero result sets, or other errors), or did the authors use some methods to guarantee no cross influence of the different server specifications? If so, please explain this in the paper.
Overall, the evaluation section 6 is very hard to read and could benefit from a better structure ( see also the minor comments)
Again, i would suggest to merge the Fig2 and Fig7 so Fig4 is not needed
More minor comments:
Abstract
It might be not entirely clear to every reader why an abundance of datasets directly leads to the development of SPARQL federation. An additional clarification sentence would be very helpful
Section 2 regarding/related to the result completeness:
I wonder if the author considered the correctness of returned results to be a crucial for a SPARQL federation engine?
Might be interesting to add some comment about how the federation setup and join implementation might can lead to wrong results.
Section 3:
last paragraph in section 3.1. Formatting of the keywords "SPARQL " and " query federation"
Maybe enumerate the three categories of engines rather then just having a paragraph, Same for the three sub-categories.
-> Provides a better structure
Section 5, first sentence:
The author show *some* of the variables that may influence the behavior, which immediately raises the questions what other variables exist and which might have a significant influence.
Section 6.1:
I would already explain why you are not using SP2Bench in experiment 1 at this part of the paper.
Just to avoid confusion, please add that the Python method time, reports the time in seconds as floats.
The current text might cause confusion if someone does not know the return value of the methods and assumes that you compare milliseconds (Java) vs (rounded) seconds (Python) runtime measures.
Section 6.1.1: paragraph 2, you mention three query categories, but list 4 (including SP2Bench).
p12. format of text ( line space too big)
Section 6.2, para 1:
"We select four metrics" -> the authors list five metrics.
Table 11: what is the meaning of "-" , does it mean that the approach returned the right results, please indicate.
Section 6.3.6 format of para2 ( line spacing).
"queires" -> "queries"
|