Review Comment:
I really appreciate the effort of the authors addressing the comments made by me and the other reviewers. The quality of the paper was hugely improved in the last round. I find now the paper more self contained and concise on the contributions made to the area. Thank you for publishing your scripts used to run the experiments. That is a great addition towards reproducibility and re-usability of RDFRules.
I still have some really minor comments, however, I think the authors can address them easily before the camera ready and there is no need for another round---at least from my side.
## Minor comments
1. Table captions should be on top of the tables and, in general, I think you could add some differentiation to the headers of all tables
2. In the abstract it would be better to enumerate the performance improvements, e.g., (1) the top-k …, (2) … , (3) …
3. Section 1: “the AMIE+ algorithm can take days, but the presented approach has a more than an order of magnitude shorter mining time” —> in which dataset did you observe this behaviour? I’d be better to name it or reference to your experiments
4. Section 2: “RDF-style KGs” —> I think that here it should just say RDF KGs, remove the “style” part. It’s true that other databases can be converted to RDF; however, you do not process the original format, but the final RDF format
5. Section 2: “Jena API” —> Please provide a link to the project web page
6. Section 2: “included on the output” —> included *in* the output
7. Section 2, typo: “Interestingy, both” —> *Interestingly*, both
8. Section 2: “suggesting subclassof relations” —> do you mean the rdfs:subClassOf relation?
9. Section 2: “which can facilitate work of ontology engineers” —> which can facilitate *the* work of ontology engineers or the work on ontology engineering
10. Section 2: “ILP systems have been reported too slow” —> ILP systems have been reported *to be* too slow
11. Section 2: First mention to YAGO is missing a reference or link to the project
12. Section 2: “ILP systems such as Aleph” —> ILP systems such as *ALEPH* (in uppercase)
13. Section 2, page 3: Third paragraph on the right has a sentence “however” followed by another sentence “however”
14. Section 2: “KGs [20] and is also used” —> KGs [20] and also used (without the *is*)
15. Section 2: First mention to Apache Spark is missing a link to the project
16. Section 2: When introducing SWARM I think it’s needed an explanation for the association rule used as example
17. Section 2: “Apriori algorithm. .” —> double full stop
18. Section 2: the following sentence seems odd “a use of a reasoner in combination with association rule mining results in a high chance of mining not finishing in ’reasonable time’ [26]” —> *the* use of a reasoner in combination with association rule mining is likely to exceed what one can consider a “reasonable time” [26]
19. Section 2: “rules.They” —> there is a missing space after the full stop
20. Section 2: “the SDType algorithm” —> missing reference to the paper
21. Section 2: “e.g. RESCAL [28], HolE [29] or TransE [30]” —> *and* TransE (you are listing examples not providing options)
22. Section 3: “E.g., the atom” —> *For example*, the atom
23. Section 3: “In this respect, AMIE-style rules” —> In this respect, *Horn rules*
24. Section 3.3: Nitpicking, example of substitution using and , the ?a doesn’t have always the same font
25. Section 3.4.2: “is also infrequent (this is called the upward closure property).” —> any reference you could provide here?
26. Section 3.4.2: When defining support you use the instantiation operator, which is used over a rule body and head implication. Should this be B \wedge H or am I missing something?
27. Section 3.4.3: Your equation defining bsize is the same as the one provided for support and the condition also involves instantiation with a rule. Following AMIE+, the instantiation right part should only involve the body of the rule
28. Section 3.4.3: “the number of unique objects in the (empirical) domain of p in KG” —> *subjects* not objects
29. Section 3.4.4: “Galárraga et al. [2] witness)” —> witness is a noun, used witnessed or observed instead
30. Section 5.1: “The recursion is stopped after all fresh atoms have been bound.” —> *found*
31. Section 5.6: “Suppose two rules U and V” —> I think previously you used $r_1$ and $r_2$ to describe two rules. I think you should keep consistency on that notation
32. Section 6: “from RDF-style data” —> from RDF data
33. Section 6.2: “Actions are further divided into the streaming and strict ones” —> I still believe you should use “batch”, where batch is a group of triples or all the set of triples, i.e., the whole KG
34. Section 6.3: “The operations implemented over the RDFDataset structure are the same as for the RDFGraph structure, aside a few extensions.” —> You mention some differences between RDFDataset structure and RDFGraph structure. Where are these extensions enumerated?
35. Section 6.5: “(cf. Algorithm 1)” —> “confer” (“cf.”) should be used to point to another source discussing the same issue or making the same argument. In this case, you should use “see” to point to your algorithm
36. Section 6, Table 2: This table is missing a clear highlight of the headers that would help improving the readability of the table. Also, if the first column is important, you could bold the text to differentiate, instead of the double line.
37. Section 7.1: “The number of triples and their unique elements for each used dataset are shown in Table 3.” —> remove the *used* word
38. Section 7, Table 4: Table 4 seems to have a formatting issue on the reported results for DBpedia 3.8. It is not clear what are setups for the different values reported.
39. Section 7, Table 4: “with approx. 7.14 h“ —> How this approximation was made? Why do you mention first > 2 days?
40. Section 7.2.2: “mining time was set for two days” —> mining time was set *to* two days
|