Wikidata through the Eyes of DBpedia

Tracking #: 1462-2674

Authors: 
Ali Ismayilov
Dimitris Kontokostas
Sören Auer
Jens Lehmann
Sebastian Hellmann

Responsible editor: 
Aidan Hogan

Submission type: 
Dataset Description
Abstract: 
DBpedia is one of the earliest and most prominent nodes of the Linked Open Data cloud. DBpedia extracts and provides structured data for various crowd-maintained information sources such as over 100 Wikipedia language editions as well as Wikimedia Commons by employing a mature ontology and a stable and thorough Linked Data publishing lifecycle. Wikidata, on the other hand, has recently emerged as a user curated source for structured information which is included inWikipedia. In this paper, we present how Wikidata is incorporated in the DBpedia eco-system. Enriching DBpedia with structured information fromWikidata provides added value for a number of usage scenarios. We outline those scenarios and describe the structure and conversion process of the DBpediaWikidata (DBW) dataset.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Heiko Paulheim submitted on 24/Oct/2016
Suggestion:
Minor Revision
Review Comment:

I truly appreciate the work the authors have undertaken for this revision of the paper. Most of my concerns have been addressed, and the paper now represents a solid dataset paper.

There are a few points which should be addressed before publication, but I am very confident that the authors are capable of doing so.

In terms of organization of the paper, it seems a bit odd that DBpedia and Wikidata are introduced in section 2, while section 1 already contains a comparison of the two. The order should be reversed. Moreover, The points in section 1 (identifiers, structure, etc.) should be summarized in a tabular comparison. That table should also contain the proposed dataset to illustrate the benefits of the endevaour.

In section 3, it is stated that "threre has been extensive tool development..." - refering to a survey or naming a few prominent tools/applications here would make this statement more credible. In the same paragraph, it is stated that DBw should work with any application consuming DBpedia data. Adding a small section towards the end demonstrating this (i.e., changing the DBpedia endpoint to the DBw one and showing the results of the same app) would be appropriate. I am quite sure that if that adds another page to the paper, this should be fine, but the authors may want to double check this with the editors.

In section 3, it is stated that the DBpedia ontology has reached a stable state. However, my understanding is that the ontology is still evolving. There seem to be a few changes between the latest versions of the ontology, too (including 15 newly introduced classes for the latest release).

Section 4.2 mentions that type inferencing. I wonder whether heuristic typing, which has been included in the DIEF since the latest release, is also used for DBw, and with which results.

Section 4.3 states that "the hash function guarantees the IRI uniqueness" - what happens in the case of hash collisions?

A statement about the instance overlap of DBpedia and DBw would be interesting.

In the conclusions section, I would like to see a statement on future plans of the provisioning - is this a one shot, do you plan yearly releases like/along DBpedia, or a service reflecting timely data, like DBpedia live?

Minor:
* Overall, interpunctuation should be checked by a native speaker.
* Footnote 6 seems misplaced.
* p.3: "RDF as a first class citizen" -> rather replace with "primary data representation mechanism"
* p.4: "The DBpedia Information Extraction Framework greatly refactored" -> "was greatly refactored"?
* p.4: OWL punning should be briefly explained or at least be augmented with a reference.
* p.5: For geo-related functions, $getGeoRss should be contained in the headline
* p.5: "schema & value transformations" -> use "and" instead of "&"

Review #2
By Denny Vrandecic submitted on 24/Oct/2016
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'Data Description' and should be reviewed along the following dimensions: Linked Dataset Descriptions - short papers (typically up to 10 pages) containing a concise description of a Linked Dataset. The paper shall describe in concise and clear terms key characteristics of the dataset as a guide to its usage for various (possibly unforeseen) purposes. In particular, such a paper shall typically give information, amongst others, on the following aspects of the dataset: name, URL, version date and number, licensing, availability, etc.; topic coverage, source for the data, purpose and method of creation and maintenance, reported usage etc.; metrics and statistics on external and internal connectivity, use of established vocabularies (e.g., RDF, OWL, SKOS, FOAF), language expressivity, growth; examples and critical discussion of typical knowledge modeling patterns used; known shortcomings of the dataset. Papers will be evaluated along the following dimensions: (1) Quality and stability of the dataset - evidence must be provided. (2) Usefulness of the dataset, which should be shown by corresponding third-party uses - evidence must be provided. (3) Clarity and completeness of the descriptions. Papers should usually be written by people involved in the generation or maintenance of the dataset, or with the consent of these people. We strongly encourage authors of dataset description paper to provide details about the used vocabularies; ideally using the 5 star rating provided here .

==================================================================================

Thank you for addressing most of the comments in the previous review. It would have facilitated a faster review if you would have actually referred to my points in the review by the numbers I provided. Also, that would have allowed me to see whether you have answered them all. Like this it is much harder for me as a reviewer to say what happened.

There are a few minor revisions I recommend:
Section 1, Curation: "and thus, is a read-only dataset." remove "thus,"
Para "Publication": alignment of paragraph is broken
Para "Data Freshness": "(and thus, transitively in DBpedia)" - change "DBpedia" to "DBpedia live".

I asked in the previous review about the complementarity. I still think this has not been well addressed.

last para" for the design decision that shaped DBw" -> decisions

Section 2

Footnote 5 seems to be on the wrong place?

There is no item type "query" in Wikidata (it was planned, but not implemented).

Section 3

"...while maximising compatibility" - with what?

Point 29 from my first review was about the second paragraph, titled "Re-publishing minted IRIs as linked data", not about the first paragraph, "New IRI minting". I don't understand what the design decision in the 2nd paragraph is about.

Section 4

Point 34 from my previous review has not been answered as far as I can tell.

Section 4.1.2: "mediawiki" -> "MediaWiki"

Section 4.3, end: "provides an example slit IRI." -> "split"

Section 6: as discussed in point 46, I'd drop the word "Evaluation" from the title of the section

Section 8: paragraph on "Use cases for Wikidata": I am very excited about that and looking forward to it.

Para "Combination of both datasets": "wikidata" -> "Wikidata"
"a bridge that we hope that will make" drop the second "that"