FactForge: A fast track to the web of data
This is the final update of a second revised resubmission, after an "accept with minor revisions" - which has now been accepted for publication. The reviews below are for the first revision, followed further below by the reviews for the original sumbmission.
Solicited Review by Aidan Hogan:
[Note: my initial open review was unsolicited, but this re-review is solicited. The authors have not directly addressed my comments; I ask that they do and will keep this relatively brief.]
The main concern with this paper was the overlap present between this and the OWLIM paper (I've read both updated versions.) There is still some minimal overlap, but I don't see this as a problem anymore.
Some other comments:
- You talk about aiming for O(n*log(n)) complexity, but applying rulesets like OWL 2 RL/pD*/etc. is cubic with respect to known terms. You may be able to do reasoning over one particular (large) dataset within the above bound, but big-O notation is not applicable/needed.
- Your comparison of OWL 2 RL/OWL Horst inferencing as being comparable to within 1% for your data is puzzling given that OWL Horst infers statements like ""?s rdf:type rdfs:Resource ."" through rdfs4* rules and OWL 2 RL/RDF rules don't. Without saying *precisely* which rules of each profile you support, the result is ambiguous. I know you draw the conclusion weakly, and by removing certain "syntactic" rules from each profile you're probably correct, but as is stands the paragraph needs clarification.
- OWL Horst is not a dialect (it is not a language). It is a partial-axiomatisation of OWL RDF-based semantics.
- OWL 2 RL is not a rule language... it is a profile of OWL 2 (a dialect). Strictly speaking, OWL 2 RL/RDF is the name of the ruleset.
- No DC-*Terms* vocabulary?
- What is an entity? What is a node in the RDF graph? How do they increase by means of reasoning? (In my mind, objects of triples are also nodes.)
- Fix the flow of text around Fig. 3 and Table 2.
- web of data (capitalise)
- Linked data/linked data (capitalise)
- pD-entailment -> pD* entailment
- OWL2 RL, etc. (make consistent)
- Such as setup -> Such a setup
- in [|] Table 2
In addition, please see the comments in my initial open review.
Solicited Review by Thorsten Liebig:
The submission introduces FactForge, a repository of selected linked open data sources with OWL 2 RL like reasoning capabilities and an SPARQL query endpoint. The paper describes the included LOD datasets, provides an overview about the reasoning and consistency mechanisms as well as some statistics accompanied with a brief data analysis.
Altogether the paper is an appreciable submission as it describes a useful piece of practical SW infrastructure. The content also is within the focus of the call for papers.
However, even the resubmission still has some minor issues from my perspective.
- Data quality. The authors report about poor data quality which even makes the overall dataset "not suitable for reasoning". There seems to be an approach (sec. 6.4) which helps to eliminate most of the problems. However, this process seems to be mostly manually and is a barrier when trying to update FactForge in time with the data sources. I don't see this a particular problem of FactForge but as a general problem where LOD data sets which mostly follow a quantity, not quality paradigm. To be honest, I see a major problem in this situation and would like to have some statement from the authors wrt. this issue (e.g. whether they plan to build their own crawlers?).
- Related work. The authors claim that there is no other query-able LOD repositories which perform inferences. I would appreciate if they would make clear how they, for instance compare to the public available Virtuoso endpoint maintaining at least DBpedia (but probably much more of the LOD big picture).
- Outlook and future work. There is now a brief statement in the conclusion that FactForge would make a suitable backend for clients that consume linked data. Please provide more information about current users and intended usage. Furthermore, do you plan to allow users to operate their own instance of FactForge in order to guarantee some level of bandwidth for their application? How could people add their own data? Is this only triggered by the authors/operators of FactForge? Will SPARQL remain the only interface for FactForge?
Minor editorial remarks:
- First sentence in sec. 5 contains a linebreak
- Table 2 and Figure 3 are badly placed on page 6 resp. 9 and interrupt the flow of reading
Solicited Review by Michel Dumontier:
I am satisfied that my prior comments are mostly addressed in the revised manuscript.
The review comments below are for the original submission.
Solicited Review by Thorsten Liebig:
The paper provides an overview of FactForge, a repository of selected linked open data sources with reasoning capabilities and a SPARQL query endpoint. The paper is well written and the content is presented in a reasonable way, although it consists of some sections which also appear in a submission also under review at SWJ.
I appreciate the submission because it provides a useful piece of practical SW infrastructure and affects real-word problems of LOD, namely integration and scalability.
However, I do see some minor issues and open questions with the paper.
- In sec. 2 you mention a problem of todays LOD datasets saying that many publishers create data without properly understanding the underlying semantical framework. Your should at least mention some of these problems. Detailed information probably would help those people to fix their modeling issues. Furthermore, is there anything you suggest here? Any mechanism in FactForge to identify and fix this problems? Is manual analysis and editing the only way to achieve data quality? In case there is no (semi-)automatic way of doing this FactForge will always be behind time and painful to extend with new sources.
- Somehow related to issue above: you mention a limitation of your approach which is only applicable to more or less static data. However, the real-world is not static. Is the truly reason your caching and indexing routines or rather the manual post-processing of the datasets as a result of the issue above? In your OWLIM submission you state that your reasoning system efficiently can handle retractions which indicates that your indexing mechanism can handle dynamic data. Please explain.
- In sec. 4 you exclude role-chains (in favor of scalability, I guess). All other language parts of OWL 2 RL are supported, right? Please comment why do you exclude this quite useful statement.
- SPARQL is supported as query language. This language easily reaches its limits for sophisticated queries, e.g. where one wants to query for direct successors/predecessors of transitive relationships such as the direct subclasses. What is the FactForge solution to this? Will other query languages such as SPARQL-DL or the OWLlink protocol supported in the future?
- Who is using FactForge right now? What the most likely use-cases for the system? What comes next?
I tested the system by adapting some of the given demo SPARQL queries which where very similar to the original one with mixed success (e.g. interestingly no cities in the Germany with buildings by Richard Meier). Furthermore, the "Include inferred" option doesn't change the number of results for any of the given sample queries. Is there an explanation for this?
Another issue bothered me when using the system: which classes and/or properties should I use for querying? For instance, when asking for the buildings of Richard Norman Shaw should I ask for fb:architecture.architect.structures_designed (as done in the sample query) or rather use dbpedia:architect? Has there been some ontology alignment for the different LOD sources? I think this is a serious problem for LOD in general (and for FactForge in particular).
Solicited Review by Eric Prud'hommeaux:
The paper describes a system, FactForge, which wharehouses Linked
(Open) Data datasets. It uses BigOWLIM to perform large-scale RL
foward-chaining and annotate the entities with preferred lables and
"rdf rank". The resulting graph is queryable with SPARQL, and
parameterizable [in response to a missing SPARQL feature],
parameterized to enable limits on redundant owl:sameAs answers.
A comprehensive set of statistics provides the user with a good sense
of what reasoning is achievable with the selected (and well-described)
datasets. These statistics delve into the degree to which deductive
closures expand the knowledge base, including and excluding owl:sameAs
inferences. This "inferred closure ratio" would be be more clear with
exemplars of the outliers in this metric, for instance, contrasing the
maximum transitive closure length in DBpedia against that in Lingvoj.
Amoung the contributions to handling real-world data is a discussion
of the observably erroneous transitive properties asserted in DBpedia.
The inconsistent hierarchies led to cyclic broaderThan relationships,
many of which were programatically isolated and repared, but some
thousands of which required manual inspection/repair.
The resulting dataset is compared to the synthetic OWL benchmark
LUBM. The contrasts can guide those evaluating inference systems which
will be operating over real-world data, and can guide the development
of future benchmarks. (There is an assertion that LUBMs small number
of predicates serves as an advantage in systems without a predicate
index, though it seems that any traversing data with more than one
predicate would benefit from an predicate indexe.)
The work presented nicely documents and meets the stated objectives of
the research. It is worth noting that none of the achived goals of the
work specifically require any of the listed features of linked data;
that the wholesale retrieval, sanitization and inference over data
sets never requires dereferencable identifiers for nodes or
predicates. That said, the LOD initiative has lent energy to the
SemWeb, fostering the creation of these useful datasets.
While this is interesting work towards scalable reasoning, I consider
it a stepping stone to a truly web-scalabale inference system which
enables limited reasoning over the necessarily distributed datasets of
the Semantic Web (to use a popular analogy: reasons over the bazaar
instead of the cathedral).
The utility of page rank is neither defended nor explained in a
§2 ¶7 gave me the impression that the reasoning was parameterizable,
though the conclusions imply it was exactly OWL RL.
§2 ¶8: s/is connected to at least one of the others
/is transitively connected to all others/
# at least, that's my guess.
§1 ¶4: while somewhat intuitive, "priming" should be introduced,
perhaps parenthetically. Reference  provided no definition.
§1.2 ¶4: R-entailment is described as patterns expressible by placing
variables anywhere in a consequent and at least those variables in the
antecedent. A statement that OWL uses a judicious subset would prevent
the reader from assuming that hard inferences were scalable.
Solicited Review by Michel Dumontier:
This paper presents FactForge as a web application that uses BigOWLIM to reason about LOD data. FactForge aims to ensure i) consistency, ii) generality, iii) heterogenity and iv) reasonability wrt OWL2RL. It incorporates RDF datasets from DBPedia, Freebase, Geonames, CIA World Factbook, Lingvoj and MusicBrainz along with terminologies and schemes UMBEL, Wordnet, Dublin Core, SKOS, RSS and FOAF. The main claim is that FactForge provides the largest body of general knowledge against which full text and SPARQL queries may access a total of 9.8B statements in which 1.2B statements are asserted, 180m annotations and 7.6B statements are inferred from owl:sameAs reasoning. In general, the paper is interesting, but suffers from a lot of background and not enough specific details arising from the analysis of FactForge contents (which could be interesting).
1. The word "common sense" is used in several contexts, but no definition is provided, and nor do we understand the criteria under which it is evaluated.
p5 "the results of the inference should be consistent with *common sense* without specific assumptions about the context of interpretation. In other words, the deductive closure should not include statements which go against *common sense*, under the style and level of consensus similar to that of Wikipedia"
p7 "FactForge successfully integrates several of the central LOD datasets into a single body of knowledge. It contains *common sense* knowledge by design;"
p7 "The vast majority of the facts inferred from the knowledge in FactForge look reasonable and do not go against *common sense*; this conclusion is drawn from intensive exploration and querying of FactForge over many months."
p9 "Many months of analysis has shown that the vast majority of the inferred statements match common sense expectations. "
it is essential that we better understand the meaning and process by which statements are deemed to be consistent with "commom sense"
The conclusion states
"Although no extensive formal validation has been performed, an analysis of the ontologies and schemata used in the selected datasets indicates that the OWL dialect used is sufficiently expressive to accommodate their intended semantics."
I think the premise of the sentence really excludes the conclusion that is being drawn. Indeed, without an analysis of each and every dataset, how can we be sure that the knowledge represented contains the intended semantics for the domain? This isn't a question of whether we can reason over the schemas, but whether the information was properly represented and the right inferences being drawn.
3. Consistency checking
One of the claims is that FactForge is consistent, but it's unclear whether it would ever be *inconsistent*. To what extent (how many axioms and affected classes/individuals) does the knowledge base contain existential and universal quantifiers, cardinality restrictions (maximum, exactly), nominals, disjointness and negation, including assertions of individuals being different from one another? In other words, provide evidence that inconsistencies plausibly could have arisen, but did not. We really need to see how reasoning with BigOWLIM makes something possible that isn't with the other triple stores / OWL kbs.
on p7, reasoning about the DBpedia SKOS relations. Please include quantification of the number of affected individuals, the number of changed relations (from->to) and provide supplementary material that details these.
5. (p8) provide evidence for FactForge's "highly interconnected" combined dataset.
p5, col2: confusing use of "real-world concept", "real-world entity". URIs are identifiers for classes, relations and instances. These are used to denote all kinds of things, including concepts, types, individuals - whether they exist in the "real world" or not.