Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed
along the usual dimensions for research contributions which include :
(1) originality: The paper is not outstandingly original but provides
some insights into the benchmarking topics
(2) significance of the results: Results presented in the article are
almost useless (will be explained later)
(3) quality of writing: Quite good, couple of typos
General remarks: The paper proposes the new technology-agnostic
benchmark that tests fundamental data operations that are of interest
for a warehouse scenario, and that should be used for evaluation of
the different storage solutions. So, the main feature of this
benchmark should be fairness and justice in order to facilitate the
developers' selection of the most suitable technology. In order to
achieve that goal, the benchmark should not favor any of the storage
solution by wrongly chosen performance metric and selected queries
that are not equivalent among storage solutions. As the benchmark is
designed to evaluate relational DBMSs, No-SQL, and triple stores, so
the queries have to be in different languages (SQL, SPARQL), but still
equivalent in their semantic and complexity. This is not the case
here, and will be explained in details.
[Sec 1]:
In the Introduction, there is a statement that there is no benchmark
that combines 4 mentioned properties. Actually, there is: LDBC Social
Network Benchmark. It test fundamental data operations, it is
technology agnostic, evaluates relations DBMSes, No-SQL, triple
stores, graph database systems, etc, and it operates on synthetic
datasets, that mimic all real-world characteristics.
[Sec 2.8]:
If a computer has 16GB of RAM, it is not good idea to give all of them
to the database system.
In order to start Virtuoso server, it is necessary to have the
virtuoso.ini file in the current directory. If that is not the case,
and you start the server in foreground (just like author mentioned
with +foreground option), it is not true that there is no error
message. You will see: "There is no configuration file virtuoso.ini".
Some of the parameters are used with '+', but some of them are
supposed to be used with '-', e.g. (-f which is the same as
+foreground).
[Sec 3]: The performance metric doesn't make sense. I don't see the
reason why the preparation time will affect performance score in the
following equation: performance(database, queryscenario, testseries) =
(prepare + execution1[+execution2 + execution3])/3. For example, in
the RDBMS, in the preparation step we have creation of the indices,
and there is no such use case scenario where we will drop index before
execution of each query, and build it over and over again. Usually,
these indices are build once, before or after loading the data, and
these times should affect loading times, not query execution
times. But, on the other side, the preparation phase for triple stores
for almost all query scenarios does not exist, and all of these
measurements for Fuseki and Virtuoso are almost 0. Building indices
will take a lot of time (couple of seconds for MEDIUM test
series). This is not fair and it is triple-store biased. This is the
reason why author considered Virtuoso as "the best aggregation
performer" in Section 4.5, and it is not true at all that "Virtuoso
already stores atomic field information instead of complete records",
as the author stated. For example, in
AGGREGATE_PUBLICATIONS_PER_PUBLISHER_ALL Test Series MEDIUM, the query
execution times are:
SQLite-Xerial 1112.13 ms
PostgreSQL 1592.18 ms
Virtuoso 3018.93 ms
but in figure 4b you presented PostgreSQL as the best performer (1.0),
followed by Virtuoso (1.11) and then by SQLite-Xerial (2.18). The
reason for this is the preparation time. It is very similar in all the
other query scenarios. For example, in
AGGREGATE_PUBLICATIONS_PER_PUBLISHER_TOP10, Virtuoso was slightly
faster than SQLite-Xerial, and for one order of magnitude faster than
ArangoDB, but that cannot be seen from the performance metric:
Virtuoso (1.0), SQLite-Xerial (3.63) and ArangoDB (7.05).
[Sec 4]:
A lot of observations from this section cannot be valid because of
the wrongly chosen performance metric.
[Sec 4.1]:
Errors_Virtuoso_SMALL.txt: This is not a bug in Virtuoso, this is
the configuration issue. You should increase max vector length setting
in virtuoso.ini file. It is the same problem reported in
Errors_Virtuoso_MEDIUM.txt. Virtuoso is well known because of its
scalability, so the issue reported in Errors_Virtuoso_LARGE.txt stops
it from competition on this scale factor. It would be better to fix
the syntax of RDF file, and repeat the experiment than excluding
Virtuoso from this part of game.
[Sec 4.3]:
In the entity retrieval query scenario, there are two main
problems. The first one lies in the fact that the SQL queries executed
against relational DBMSs are not equivalent to the SPARQL queries,
while the second one is the use of DESCRIBE query statement, which is
not strictly specified in the W3C specification. DESCRIBE may produce
quite different results depending on describe-mode. I would not
recommend using constructs that are not strictly defined by the
standard. The author uses the following query:
describe * where {
?s ?p ?o .
?s ?identifier .
FILTER( ?identifier IN ( ##ids## ))
}
This is similar to:
select ?s ?p ?o where
{
{
?s ?p ?o .
?s ?identifier .
FILTER( ?identifier IN ( ##ids## ))
}
UNION
{
?s ?p ?o .
?o ?identifier .
FILTER( ?identifier IN ( ##ids## ))
}
}
which is much more complicated than the relational query:
select * from justatable where dcterms_identifier in (?);
So, this is unfair against triple stores, and favors relational
DBMSs. The equivalent query should be:
select ?s ?p ?o
where {
?s ?p ?o .
?s ?identifier .
FILTER( ?identifier IN ( "011363517" ))
}
All of these queries will be executed by Virtuoso (on my computer
which has similar power to the used one, same configurations, Test
Series MEDIUM) in 1-2ms, while the author's proposed SELECT statement
in Listing 1, will take about 7s. So, this is very unfair to
Virtuoso. In this query scenario, the ordering is not mentioned
anywhere, so the Virtuoso's bug referenced in [9] doesn't affect this
query at all.
[Sec 4.4]:
In the Conditional Table Scan scenario, the relational DBMSs are
favored at the same way as in the previous section. The needed query
should be:
select ?s ?p ?o
where {
?s .
?s ?p ?o
}
instead of:
describe *
where
{
?s ?o ?p .
optional { ?s ?type . }
?s .
}
The first query will run by Virtuoso in 300s (on my computer, as
explained before), which is comparable to the relational systems.
The second conditional query should be:
select ?s ?p ?o
where {
?s ?title .
filter regex(?title, 'stud(ie|y)', 'i') .
?s ?p ?o.
}
which will run much faster than the query executed against Virtuoso.
Queries executed against Fuseki, are not correct either. The pattern:
optional { ?s ?type . }
is not needed at all, while the pattern
optional { ?s ?title . }
should not be optional, as there is the following filter:
filter regex(?title, 'stud(ie|y)', 'i') .
Similar remarks stay in the 3rd conditional query.
[Sec 4.5]:
In the Aggregation section, queries are comparable, but the
conclusions are not (see remarks about performance metric)
[Sec 5]:
Because all of the aforementioned remarks, this section is quite
wrong. The author said that Virtuoso was well in the certain deletion
scenarios, e.g. DELETE_LOW_SELECTIVITY_PAPER_MEDIUM - Test Series
MEDIUM, but the reason for that lies in the fact that
UPDATE_LOW_SELECTIVITY_PAPER_MEDIUM finished with an error, and there
was no triple that should be deleted in this scenario.
Minor technical issues:
page 3: Do not reference pages (e.g. see page 4), instead
of that use tables, figures, etc...
page 5: rephrase the following: "Table 3 provides an overview of
characteristic properties these databases"
|