FactForge: A fast track to the web of data

Paper Title: 
FactForge: A fast track to the web of data
Authors: 
Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, Ruslan Velkov
Abstract: 
The advent of Linked Open Data has seen a large number of structured datasets from various domains made available to the public. These datasets are seen as a key enabler for the Semantic Web, where applications can consume and combine this data in powerful and meaningful ways. However, the uptake of Linked Data during this ‘introductory phase’ is hampered in ways similar to the uptake of any new technology - until the technology has found widespread use, the range of opportunities for exploiting it is limited and until the opportunities are fully explored, the uptake of the technology is restricted. FactForge is a free, publicly available service that provides an easy point of entry for would-be consumers of linked data. This Web application is based on OWLIM, a high performance semantic repository that offers outstanding RDF data management and reasoning capabilities based on OWL. The data-exploration functionality provided by FactForge exploits the advanced features of OWLIM to allow users to combine SPARQL with various full-text search and ranking functions for powerful, userguided data-mining over a number of the most popular LOD datasets. This paper gives an overview of FactForge, its many unique capabilities and its role within the emerging trend for the exploitation of Linked Open Data using OWL-based inference.
Full PDF Version: 
Submission type: 
Application Report
Responsible editor: 
Michel Dumontier
Decision/Status: 
Accept
Reviews: 

This is the final update of a second revised resubmission, after an "accept with minor revisions" - which has now been accepted for publication. The reviews below are for the first revision, followed further below by the reviews for the original sumbmission.

Solicited Review by Aidan Hogan:

[Note: my initial open review was unsolicited, but this re-review is solicited. The authors have not directly addressed my comments; I ask that they do and will keep this relatively brief.]

The main concern with this paper was the overlap present between this and the OWLIM paper (I've read both updated versions.) There is still some minimal overlap, but I don't see this as a problem anymore.

Some other comments:
- You talk about aiming for O(n*log(n)) complexity, but applying rulesets like OWL 2 RL/pD*/etc. is cubic with respect to known terms. You may be able to do reasoning over one particular (large) dataset within the above bound, but big-O notation is not applicable/needed.
- Your comparison of OWL 2 RL/OWL Horst inferencing as being comparable to within 1% for your data is puzzling given that OWL Horst infers statements like ""?s rdf:type rdfs:Resource ."" through rdfs4* rules and OWL 2 RL/RDF rules don't. Without saying *precisely* which rules of each profile you support, the result is ambiguous. I know you draw the conclusion weakly, and by removing certain "syntactic" rules from each profile you're probably correct, but as is stands the paragraph needs clarification.

Nit-picks:
- OWL Horst is not a dialect (it is not a language). It is a partial-axiomatisation of OWL RDF-based semantics.
- OWL 2 RL is not a rule language... it is a profile of OWL 2 (a dialect). Strictly speaking, OWL 2 RL/RDF is the name of the ruleset.
- No DC-*Terms* vocabulary?
- What is an entity? What is a node in the RDF graph? How do they increase by means of reasoning? (In my mind, objects of triples are also nodes.)
- Fix the flow of text around Fig. 3 and Table 2.

Typos:
- web of data (capitalise)
- Linked data/linked data (capitalise)
- pD-entailment -> pD* entailment
- OWL2 RL, etc. (make consistent)
- Such as setup -> Such a setup
- in [|] Table 2

In addition, please see the comments in my initial open review.

Solicited Review by Thorsten Liebig:

The submission introduces FactForge, a repository of selected linked open data sources with OWL 2 RL like reasoning capabilities and an SPARQL query endpoint. The paper describes the included LOD datasets, provides an overview about the reasoning and consistency mechanisms as well as some statistics accompanied with a brief data analysis.

Altogether the paper is an appreciable submission as it describes a useful piece of practical SW infrastructure. The content also is within the focus of the call for papers.
However, even the resubmission still has some minor issues from my perspective.

- Data quality. The authors report about poor data quality which even makes the overall dataset "not suitable for reasoning". There seems to be an approach (sec. 6.4) which helps to eliminate most of the problems. However, this process seems to be mostly manually and is a barrier when trying to update FactForge in time with the data sources. I don't see this a particular problem of FactForge but as a general problem where LOD data sets which mostly follow a quantity, not quality paradigm. To be honest, I see a major problem in this situation and would like to have some statement from the authors wrt. this issue (e.g. whether they plan to build their own crawlers?).

- Related work. The authors claim that there is no other query-able LOD repositories which perform inferences. I would appreciate if they would make clear how they, for instance compare to the public available Virtuoso endpoint maintaining at least DBpedia (but probably much more of the LOD big picture).

- Outlook and future work. There is now a brief statement in the conclusion that FactForge would make a suitable backend for clients that consume linked data. Please provide more information about current users and intended usage. Furthermore, do you plan to allow users to operate their own instance of FactForge in order to guarantee some level of bandwidth for their application? How could people add their own data? Is this only triggered by the authors/operators of FactForge? Will SPARQL remain the only interface for FactForge?

Minor editorial remarks:
- First sentence in sec. 5 contains a linebreak
- Table 2 and Figure 3 are badly placed on page 6 resp. 9 and interrupt the flow of reading

Solicited Review by Michel Dumontier:

I am satisfied that my prior comments are mostly addressed in the revised manuscript.

The review comments below are for the original submission.

Solicited Review by Thorsten Liebig:

The paper provides an overview of FactForge, a repository of selected linked open data sources with reasoning capabilities and a SPARQL query endpoint. The paper is well written and the content is presented in a reasonable way, although it consists of some sections which also appear in a submission also under review at SWJ.

I appreciate the submission because it provides a useful piece of practical SW infrastructure and affects real-word problems of LOD, namely integration and scalability.
However, I do see some minor issues and open questions with the paper.

- In sec. 2 you mention a problem of todays LOD datasets saying that many publishers create data without properly understanding the underlying semantical framework. Your should at least mention some of these problems. Detailed information probably would help those people to fix their modeling issues. Furthermore, is there anything you suggest here? Any mechanism in FactForge to identify and fix this problems? Is manual analysis and editing the only way to achieve data quality? In case there is no (semi-)automatic way of doing this FactForge will always be behind time and painful to extend with new sources.

- Somehow related to issue above: you mention a limitation of your approach which is only applicable to more or less static data. However, the real-world is not static. Is the truly reason your caching and indexing routines or rather the manual post-processing of the datasets as a result of the issue above? In your OWLIM submission you state that your reasoning system efficiently can handle retractions which indicates that your indexing mechanism can handle dynamic data. Please explain.

- In sec. 4 you exclude role-chains (in favor of scalability, I guess). All other language parts of OWL 2 RL are supported, right? Please comment why do you exclude this quite useful statement.

- SPARQL is supported as query language. This language easily reaches its limits for sophisticated queries, e.g. where one wants to query for direct successors/predecessors of transitive relationships such as the direct subclasses. What is the FactForge solution to this? Will other query languages such as SPARQL-DL or the OWLlink protocol supported in the future?

- Who is using FactForge right now? What the most likely use-cases for the system? What comes next?

I tested the system by adapting some of the given demo SPARQL queries which where very similar to the original one with mixed success (e.g. interestingly no cities in the Germany with buildings by Richard Meier). Furthermore, the "Include inferred" option doesn't change the number of results for any of the given sample queries. Is there an explanation for this?
Another issue bothered me when using the system: which classes and/or properties should I use for querying? For instance, when asking for the buildings of Richard Norman Shaw should I ask for fb:architecture.architect.structures_designed (as done in the sample query) or rather use dbpedia:architect? Has there been some ontology alignment for the different LOD sources? I think this is a serious problem for LOD in general (and for FactForge in particular).

Solicited Review by Eric Prud'hommeaux:

The paper describes a system, FactForge, which wharehouses Linked
(Open) Data datasets. It uses BigOWLIM to perform large-scale RL
foward-chaining and annotate the entities with preferred lables and
"rdf rank". The resulting graph is queryable with SPARQL, and
parameterizable [in response to a missing SPARQL feature],
parameterized to enable limits on redundant owl:sameAs answers.

A comprehensive set of statistics provides the user with a good sense
of what reasoning is achievable with the selected (and well-described)
datasets. These statistics delve into the degree to which deductive
closures expand the knowledge base, including and excluding owl:sameAs
inferences. This "inferred closure ratio" would be be more clear with
exemplars of the outliers in this metric, for instance, contrasing the
maximum transitive closure length in DBpedia against that in Lingvoj.

Amoung the contributions to handling real-world data is a discussion
of the observably erroneous transitive properties asserted in DBpedia.
The inconsistent hierarchies led to cyclic broaderThan relationships,
many of which were programatically isolated and repared, but some
thousands of which required manual inspection/repair.

The resulting dataset is compared to the synthetic OWL benchmark
LUBM. The contrasts can guide those evaluating inference systems which
will be operating over real-world data, and can guide the development
of future benchmarks. (There is an assertion that LUBMs small number
of predicates serves as an advantage in systems without a predicate
index, though it seems that any traversing data with more than one
predicate would benefit from an predicate indexe.)

The work presented nicely documents and meets the stated objectives of
the research. It is worth noting that none of the achived goals of the
work specifically require any of the listed features of linked data;
that the wholesale retrieval, sanitization and inference over data
sets never requires dereferencable identifiers for nodes or
predicates. That said, the LOD initiative has lent energy to the
SemWeb, fostering the creation of these useful datasets.

While this is interesting work towards scalable reasoning, I consider
it a stepping stone to a truly web-scalabale inference system which
enables limited reasoning over the necessarily distributed datasets of
the Semantic Web (to use a popular analogy: reasons over the bazaar
instead of the cathedral).

The utility of page rank is neither defended nor explained in a
reference.

Editorial comments:

§2 ¶7 gave me the impression that the reasoning was parameterizable,
though the conclusions imply it was exactly OWL RL.

§2 ¶8: s/is connected to at least one of the others
/is transitively connected to all others/
# at least, that's my guess.

§1 ¶4: while somewhat intuitive, "priming" should be introduced,
perhaps parenthetically. Reference [17] provided no definition.

§1.2 ¶4: R-entailment is described as patterns expressible by placing
variables anywhere in a consequent and at least those variables in the
antecedent. A statement that OWL uses a judicious subset would prevent
the reader from assuming that hard inferences were scalable.

Solicited Review by Michel Dumontier:

This paper presents FactForge as a web application that uses BigOWLIM to reason about LOD data. FactForge aims to ensure i) consistency, ii) generality, iii) heterogenity and iv) reasonability wrt OWL2RL. It incorporates RDF datasets from DBPedia, Freebase, Geonames, CIA World Factbook, Lingvoj and MusicBrainz along with terminologies and schemes UMBEL, Wordnet, Dublin Core, SKOS, RSS and FOAF. The main claim is that FactForge provides the largest body of general knowledge against which full text and SPARQL queries may access a total of 9.8B statements in which 1.2B statements are asserted, 180m annotations and 7.6B statements are inferred from owl:sameAs reasoning. In general, the paper is interesting, but suffers from a lot of background and not enough specific details arising from the analysis of FactForge contents (which could be interesting).

Major Revisions
---------------
1. The word "common sense" is used in several contexts, but no definition is provided, and nor do we understand the criteria under which it is evaluated.
p5 "the results of the inference should be consistent with *common sense* without specific assumptions about the context of interpretation. In other words, the deductive closure should not include statements which go against *common sense*, under the style and level of consensus similar to that of Wikipedia"
p7 "FactForge successfully integrates several of the central LOD datasets into a single body of knowledge. It contains *common sense* knowledge by design;"
p7 "The vast majority of the facts inferred from the knowledge in FactForge look reasonable and do not go against *common sense*; this conclusion is drawn from intensive exploration and querying of FactForge over many months."
p9 "Many months of analysis has shown that the vast majority of the inferred statements match common sense expectations. "

it is essential that we better understand the meaning and process by which statements are deemed to be consistent with "commom sense"

2. Expressivity
The conclusion states
"Although no extensive formal validation has been performed, an analysis of the ontologies and schemata used in the selected datasets indicates that the OWL dialect used is sufficiently expressive to accommodate their intended semantics."

I think the premise of the sentence really excludes the conclusion that is being drawn. Indeed, without an analysis of each and every dataset, how can we be sure that the knowledge represented contains the intended semantics for the domain? This isn't a question of whether we can reason over the schemas, but whether the information was properly represented and the right inferences being drawn.

3. Consistency checking
One of the claims is that FactForge is consistent, but it's unclear whether it would ever be *inconsistent*. To what extent (how many axioms and affected classes/individuals) does the knowledge base contain existential and universal quantifiers, cardinality restrictions (maximum, exactly), nominals, disjointness and negation, including assertions of individuals being different from one another? In other words, provide evidence that inconsistencies plausibly could have arisen, but did not. We really need to see how reasoning with BigOWLIM makes something possible that isn't with the other triple stores / OWL kbs.

4. "reasonability"
on p7, reasoning about the DBpedia SKOS relations. Please include quantification of the number of affected individuals, the number of changed relations (from->to) and provide supplementary material that details these.

5. (p8) provide evidence for FactForge's "highly interconnected" combined dataset.

Minor Revisions
---------------
p5, col2: confusing use of "real-world concept", "real-world entity". URIs are identifiers for classes, relations and instances. These are used to denote all kinds of things, including concepts, types, individuals - whether they exist in the "real world" or not.

Tags: 

Comments

Briefly, I'm a little concerned about some of the claims in this paper, and some re-use of content from other papers (see below). I don't mean to detract too much from what is obviously good work, but these issues should be addressed in a final version.

> Based on known results, only OWL Horst-like languages are suitable for reasoning with data in the order of billions of statements.

What do you mean by "OWL Horst-like"? RDFS is much more efficient to compute that OWL Horst.

> the worst case complexity of the algorithms for basic reasoning tasks indicates that they are intractable when applied to large scale knowledge bases and datasets.

(In)tractability in the reasoning sense is independent of data scale.

> Several of the central LOD datasets were selected, loaded in to a BigOWLIM semantic repository and modelling errors were fixed. Forward-chaining inference was performed to materialize the deductive closure that amounts to some 9.8 billion retrievable statements.

This claim – taken on its own – is misleading, and tends to suggest that you materialise 9.8 billion statements. Although it's clear from the rest of the paper that you materialise 881 million, and you should rephrase this claim to make that clear: e.g., add "...which, along with 1.177b explicit statements, 180m annotation statements and 7.761b statements inferable through backward-chaining of owl:sameAs inferences, amounts to some 9.8 billion retrievable statements."

> To the best of our knowledge, FactForge is the largest body of general knowledge (not specific to a particular scientific domain) against which inference has been performed.

This is the main motivation for this comment, where you made a similar claim in [1]: there are a number of works which perform inferencing at the scale you do over arbitrary Linked Data (the only difference to your dataset is that the corpora in these works are not the merge of manually selected subsets of Linked Data). In [3], we performed reasoning over a 1.1b statement crawl from Linked Data; we discussed similar optimisations and trade-offs that you do. This is comparable with the scale of the work in this submission, and twice that of [1] which originally made the claim – initial versions of this work were also presented in [4] at the same venue as [1] the year previously. In [5], the authors performed RDFS reasoning over 865m Linked Data triples, with twice the scale of [1], and similar (albeit lower) scale to the current submission – this was presented in the same conference as [1]. These related works do not look at the expressive level of inferencing you do, nor do they look at query answering. They do, however, pretty much invalidate the above claim made in the current submission and the same claim made in [1].

Finally, a note to reviewers: you may want to investigate two other papers which are very similar in content and have not been cited. [1] is an earlier Semantic Web Challenge entry with much of the same content. [2] is a parallel submission to this journal, where in particular Section 4.1 of this paper corresponds to Section 3.3 in [2]. I'll leave this up to the reviewers/editors to consider – it may not be such an issue since [1] is a rather informal submission of earlier (but heavily overlapping work), and [2] has its own distinct and significant contribution despite some overlapping content. I thought I would mention it though since, e.g., [1] is not cited in the submission.

[1] A. Kiryakov, D. Ognyanoff, R. Velkov, Z. Tashev, I. Peikov. LDSR: a Reason-able View to the Web of Linked Data. In: Semantic Web Challenge (ISWC2009), 2009.

[2] Barry Bishop, Atanas Kiryakov, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev, Ruslan Velkov, OWLIM: A family of scalable semantic repositories. Under review for SWJ: http://www.semantic-web-journal.net/sites/default/files/swj97.pdf

[3] Aidan Hogan, Andreas Harth, Axel Polleres. Scalable Authoritative OWL Reasoning for the Web. In International Journal on Semantic Web and Information Systems (IJSWIS), 5(2), April-June, 2009. http://www.deri.ie/fileadmin/documents/DERI-TR-2009-04-21.pdf

[4] Aidan Hogan, Andreas Harth, Axel Polleres. Scalable Authoritative OWL Reasoning on a Billion Triples. In Proceedings of Billion Triple Semantic Web Challenge 2008, at the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Germany, 2008. http://aidanhogan.com/docs/saor_billiontc08.pdf

[5] J. Urbani, S. Kotoulas, E. Oren, F. van Harmelen. Scalable Distributed Reasoning Using MapReduce. In International Semantic Web Conference, 2009. http://www.cs.vu.nl/~frankh/postscript/ISWC09.pdf