What is a Knowledge Graph?

Tracking #: 1954-3167

Authors: 
Jamie McCusker
John S. Erickson
Katherine Chastain
Sabbir Rashid
Rukmal Weerawarana
Deborah L McGuinness

Responsible editor: 
Guest Editors Knowledge Graphs 2018

Submission type: 
Survey Article
Abstract: 
Knowledge graphs have enjoyed a resurgence in research interests after the development of several commercial projects, such as Google's knowledge graph. However, the use of the term has evolved and now may refer to a wide range of graphs, that may not include clear and unambiguous definitions or references. To better provide clarity to knowledge graph research, we survey the literature for current efforts that may inform a knowledge graph definition, and then use that review along with our work to synthesize a definition that is relevant and informative to current knowledge graph research, while constraining the research space that may be considered a knowledge graph. We define a knowledge graph as "A graph, composed of a set of assertions (edges labeled with relations) that are expressed between entities (vertices), where the meaning of the graph is encoded in its structure, the relations and entities are unambiguously identified, a limited set of relations are used to label the edges, and the graph encodes the provenance, especially justification and attribution, of the assertions." We evaluate a wide variety of knowledge resources, graphs, and ontologies to determine if they qualify under our definition, and find that while expressing knowledge as a graph structure and unambiguous denotation of entities and relations in the graph are common, it is less common to trace provenance of encoded knowledge, and less common to constrain the relations used when expressing that knowledge. We created our Knowledge Graph Catalog to support this effort, and make it available to the public to search and contribute new knowledge graphs.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 31/Aug/2018
Suggestion:
Major Revision
Review Comment:

The paper provides an overview of what knowledge graphs are, how they are built, and how they are useful for producing, managing, and using knowledge within real-world applications.
In general, my feeling is that this paper could be a valuable contribution to the Semantic Web and Artificial Intelligence communities because it reports information and references that can be used as starting point for people that want to start working with knowledge graphs.
However, before to consider this paper for acceptance, I think that there are some issues that should solved/discussed.

1) At the time of review (August 31st 2018, 10:20am) the website of the Knowledge Graph Catalog is not working. When I digit the provided url, a HTTP ERROR 500 is shown on the browser.
Having this website working is mandatory for accepting the paper because this catalog is definitely one of the contributions of this paper.

2) Section 5.1 mentions that the authors modeled an ontology for knowledge resources.
This section should be expanded by presenting the whole ontology: structure, meaning of the main concepts, building process and methodology.

3) Section 8 includes a brief discussion comparing knowledge graphs and ontologies.
This point is very sensible because such a discussion is a very hot topic.
Personally, the example provided by the authors is not convincing: if I have an ontology defining the world landmarks and I populate it, the Eiffel Tower instance could be included.
Everything depends by how much I decide to populate the ontology.
From what it is reported in the paper, it seems that the opinion of the authors is that a knowledge graph is an ontology with a complete ABox.
Where with the term "complete", I mean that all individuals belonging to the domain modeled by the ontology are defined in it.
If this is the thought of the authors, they should clearly state this in the paper.
Otherwise, a more detailed and convincing example should be reported.

A last minor thing: the last sentence of Section 6.1 is not well linked with the remaining of the text.

Review #2
Anonymous submitted on 31/Oct/2018
Suggestion:
Reject
Review Comment:

This paper provides a brief summary of a large body of work related to "knowledge graphs" and aims at providing a "definition" for the term "knowledge graph" in addition to providing a brief survey of 37 projects related knowledge graphs. I found the paper somewhat informative, but also incomplete, with inaccuracies, and lacking focus and clarity.

To elaborate, I use the suggested criteria for reviewing survey articles and point out why the paper does not meet the criteria:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.
This paper aims at providing a definition for the term "knowledge graph" which has the consequence of future work in this space either having to avoid using the term or arguing for an alternative definition, or being criticized for not using the term properly. Given the fact that the new definition has several issues (more on this below), I am not sure if this is a good consequence. The reality is that the term "knowledge graph" in the past has either been used to describe a "graph" representing "knowledge" which can be viewed as an umbrella term covering ontologies/KBs/information networks/etc. or has been used as a commercial term related to Google's usage of the term. Providing a definition for the simple umbrella term may only result in confusion and useless debates. It is also not going to help the commercial use of the term given that you do not view Google's KG as a Knowledge Graph while it is standard practice in the industry.

Issues with the definition:
- Lack of clarity: Having read your paper, I am not sure why you consider Google KG as a non-KG with unlimited relations. It is not a public KG so it is hard to tell, but the Open KG API exposes a very limited number of relations. You also seem to imply that simply including provenance is enough. I am not sure how this can help. For example, you believe DBpedia meets the requirement, but DBpedia uses different extraction strategies for different properties which can yield to lower quality in some relations. If the algorithm and field used on the infobox seems to be the right way of including provenance if the aim is improving quality and publishing the "epistemology".
- Lack of proper motivation: Consider a case where a KG is curated using crowd-sourcing and an edge is included only if validated by say 1000 experts. Then, why would lack of a provenance statement turn the KG into a "Bare Statement" graph? and why would this definition matter?

For comparison, the previous attempt at defining KGs (Ehrlinger & Wöß, reference [10]) has clarity and a clear motivation. The use of reasoning criteria is both easy to assess (you either can do it or cannot) and differentiates KGs with alternative terms such (e.g. ontologies/KBs). I am not in favor of that definition at all, but at least that is a true definition.

Another example showing this problem is clear in your own statement on GO: "The Gene Ontology (GO) may be considered more of a knowledge graph than an ontology.". A proper definition should definitely avoid such language "more of a knowledge graph" GO should either be a KG or not a KG.

(2) How comprehensive and how balanced is the presentation and coverage.
The paper discusses a large number of projects, but the discussions are shallow, and there are missing related work.
On the definition side, one class of work missing is the work on "Heterogeneous Information Networks" e.g.
http://hanj.cs.illinois.edu/pdf/ds09_han.pdf
https://arxiv.org/pdf/1511.04854.pdf
where do they fit with respect to KGs?
On the KGs side, one issue is lack of a deeper discussion of how and why each resource meets the criteria for being KGs. Another issue is including closed resources such as Google KG and information extraction systems such as DeepDive in the table. I am not sure why you believe DeepDive does not keep provenance. It does information extraction from text and you can view the text as the source of the KG, the same way Wikipedia link seems to be enough for your definition to make DBpedia a proper KG. Also, if you want to include IE systems, then you need probably 10s of more projects and papers to review.

(3) Readability and clarity of the presentation.
The paper in general lacks focus, and the organization is somewhat confusing as well. For example, you have a one-paragraph section (Section 4) on "Knowledge Graph Methods" that I am not at all sure what its purpose is.

(4) Importance of the covered material to the broader Semantic Web community.
The general topic is certainly very important to this community. The importance of providing a new definition of the term "Knowledge Graph" - I am not sure.

If the authors would like to pursue this publication, I encourage them to focus on only one aspect and re-write the paper with better focus, clarity, and motivation. For example, providing a survey of existing publicly available KGs, how they are used in practice, along with a set of "quality" criteria (e.g. similar to Tim BL's 5-star rating for Linked Data) could be a useful contribution.

Review #3
By Claudio Gutierrez submitted on 14/Nov/2018
Suggestion:
Minor Revision
Review Comment:

This paper is a good contribution towards the ongoing discussion of what is going on around the notion of "knowledge graph". It is highly important for scientists and for practitioners to understand what is behind a keyword that suddenly becomes popular. As the authors write: "Since usage has evolved, it is appropriate to develop a definition that follows how the term is currently used." (p.3) Being able to discriminate between a marketing buzzword and and emerging discipline could make a huge difference in resources and researcher's efforts and time.

The paper essentially is a discussion of current definitions of "Knowledge graph" in the literature and contrasting them on the light of a wealth of real-life cases presented and analyzed. As the authors claim: "Our purpose with this paper is to survey the evolving notion of a knowledge graph, to describe the general space, and to provide an explicit operational description of a knowledge graph." Well done. And this is a well written and easily readable paper, well structured and organized. In summary, is a highly useful article, and based on this, I recommend acceptance to publication with minor revisions.

As a proof of the value of the article (the amount of discussion it could provoke, what in my opinion should be the measure of the impact of a scientific article) I will discuss (rather critically) the notion of KG advanced in the paper. I am not suggesting that the authors consider them, but I would like just to point the issues in case of are of help to refine some aspects of the paper.

My main criticism of the paper (and of the current use of the notion in the literature today) is that the ontological (in the philosophical sense) notion of knowledge graph is absent. That is, what is the object of inquiry?

I will quote several pieces of text occurring in the paper to show that this concern is relevant. Let us begin with the following expressions in the Introduction: "We provide an updated definition along with a set of knowledge graph requirements [...] We discuss how knowledge graphs as defined are crucial component for the future of the Web have great potential change in data science and domain sciences." We face here an attempt to show that an undefined object characterized by a list of certain requirements have certain potential for (again) a vague discipline. This is not a problem of the authors nor I will blame them for this, but is the fate of those who work in KG today: working towards what could have potential for the development of the potential that have given the work of previous practitioners... Definitively we need and external point of reference and an anchor to other developments.

The key word here is "knowledge". But in what sense? The authors write: "Knowledge graphs provide an opportunity to expand our understanding of how knowledge can be managed on the Web [...]" I like very much this claim. Now, trying to refine the above claim, the authors write: "Google was one of the first to promote a semantic metadata organizational model described as a “knowledge graph,” and many other organizations have since used the term in published research on knowledge management and graph databases." True. Even though the notion was used before, it was the Google use that gave the publicity and prominence that today has "knowledge graph". 


In 2. Related Work: "[...] we believe that knowledge graphs created for specific domains such as Biology can be considered knowledge graphs if they follow the other requirements." In section 3, where the authors conceptualize "knowledge graph", one can find a thorough discussion and a clear proposed definition. Let us see: "a knowledge graph represents knowledge, and does so using a graph structure" This is a good starting point, but still avoids telling us what is the animal we are talking about. Looks like: A knowledge graph is an XXX that represents knowledge that uses a graph structure. What is XXX? How can one interpret the sentence: "Knowledge graphs use a limited set of relation types" (p.4). What are the object that "use a limited.."? A set of entities clearly. Probably nodes and edges. But soon we will learn that knowledge graphs include semantics and more. Same with the following two sentences: "Knowledge graph meaning is expressed as structure" and "Knowledge graph statements are unambiguous". Clearly a knowledge graph includes a set of sentences and a semantics. Or with this: "All identified entities in a knowledge graph, including types and relations [...]". Hence KG is a complex object that includes entities. Also we learn that context is another property that at least relations in a knowledge graph must have.

In this regard, the more formal definition given falls short:
"Graph A set of assertions (edges labeled with relations) that are expressed between entities (vertices) where the meaning of the graph is encoded in its structure. 

Unambiguous Graph A graph where the relations and entities are unambiguously identified. 

Knowledge Graph An Unambiguous Graph with a limited set of relations used to label the edges that encodes the provenance, especially justification and attribution, of the assertions."

This would be no more than a special type of Semantic Network (with special type of identifiers, etc.). But, as the authors implicitly and explicitly state, knowledge graphs involve much more than a mathematical definition. In fact, one realizes that editors, visualizations, extraction, integration, learning, semantics for different epistemologies, accessing methods, concerns about usability, reusability, web interfaces, etc. etc. are important aspects of the object known as knowledge graph.

So, a question remains unanswered: what is the field of KG research? How it differs from the tradition of KR? or of KB? or of IR? or of DB? KGs, from the cases of the useful catalog given in the text, seems to be a virtuous combination of real-life software, formal representation techniques, knowledge bases and information retrieval, plus the increasing weight that other techniques (e.g. machine learning) is having in capturing human semantics involved in text, images, videos, etc. It is important, in my opinion, to remark that it is beyond representation of knowledge in the classical formal sense (semantic networks, concept maps, etc.), because it includes multimedia and any other media that could carry human knowledge, and it encompasses software and machinery that automated information and knowledge (databases, knowledge bases, information retrieval) because it includes any technique and machinery to deal with it (capturing, extracting, transforming, visualizing, using, etc.). From the above, the KG field is a combination of science and technology. Not surprising that the Google patent that used "KG" (and not the many theoretical works that used that notion as developments of Semantic Networks) is now considered the starting point of the field. KG are interesting as long as are real-life software (and perhaps more) capturing, integrating, transforming, enriching and providing human knowledge.

In this regard KGs are closely related to the Semantic Web project. One could say that the field of KGs --in some sense-- implements the idea of the SW through these "objects" that are KG, that have more limited scope (are less universal than the whole Web with semantics) in serving particular purposes, but have been shown more "practical" in the short term.

An important practical detail: The following site is a key resource for the paper
http:// graphs.whyis.io 
and was not working on nov 10th (server error)