Structural Quality Metrics to Evaluate Knowledge Graph Quality

Tracking #: 3366-4580

Sumin Seo
Heeseon Cheon
Hyunho Kim
Dongseok Hyun

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
This work presents six structural quality metrics that measures the quality of knowledge graphs and apply the metrics to six knowledge graphs: four cross-domain knowledge graphs on the web (Wikidata, DBpedia, YAGO, Freebase), Google Knowledge Graph, and Naver's integrated knowledge graph (Raftel). The `Good Knowledge Graph' should define specific classes and properties in its ontology so that it can abundantly express knowledge in the real world. Also, Knowledge Graph should use the classes and properties actively. We tried to examine the internal quality of knowledge graphs by focusing on the structure of the ontology, which is the schema of knowledge graphs, and the degree of use thereof. As a result, We have found the characteristics of a good knowledge graph that could not be known only by scale-related indicators such as the number of classes and properties.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Giorgos Stoilos submitted on 28/Apr/2023
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include

The paper presents various structural-based metrics to assess the quality of a knowledge graph. The idea is to use the ontology (schema) defined for a Knowledge Graph for computing various frequency metrics for a given knowledge graph.

I find the problem of assessing the content or exploring a KG to get insights about them very important especially given their potential big size. Using the classification found in the following paper

Matteo Lissandrini, Davide Mottin, Katja Hose, Torben Bach Pedersen: Knowledge Graph Exploration Systems: are we lost? CIDR 2022

I would say that this paper falls under the Summarization-based KG exploration approaches.

I have the following concerns that at the current moment prohibit me to consider an acceptance or acceptance after major revisions.

1. The technical part of the paper can be significantly improved. Currently the authors define several metrics that intend to be used in order to show specific properties of the KG. However, these properties are not defined formally or it is not shown what are the properties that these metrics satisfy in a more formal way. For example do these metrics fall within [0,1]? Are they monotonic increasing. Are there other interesting properties? Can these properties be shown formally.? Or through some examples. In some occasions the authors state properties like "when classes are divided more specifically and each class has more instances, then ClasSInstantion becomes higher". This needs to be formalized and the claim need to be proven or at least shown via a few examples. If these properties are not shown then it is hard to assess their significance of using them for analyzing the internals of a KG and also hard to understand what they mean.

2. As stated above the presentation of the technical part of the paper could be improved by more examples. Currently there is one running example, however, more examples with differently structed ontologies that would lead to different measures should be given in order to see in practice how these metrics work and how they would differ in different scenarios.

3. Justification of these metrics: The authors claim that these metrics help analyze the quality of a graph. I have two issues here. First, it is not clear why these metrics are the only ones to be used and not others. Or these in combination with others. Moreover, I would agree that these metrics could be interesting to help explore the internals of a KG but not necessarily that they demonstrate the quality of a KG. For example, if an ontology defines a hierarchy about Diseases which is under-populated in the KG, this doesn't necessarily mean that the quality of the KG is low, but it could simply mean that the KG is incomplete and more data need to be extracted. In general, the notions of what is a 'good' KG or a 'better structure KG' are weak but I fear it is impossible to justify what is good and what not. Perhaps an ontology or KG that is bad according to this paper is very useful in practice. It would be good to try and make some claims less strong in the paper and try to make the framework more objective.

The metrics ICR and IPR are simple class and property instantiation metrics computed for whole KG. Their granularity is very low so I doubt they are giving a very big insight in the internals and structure of the graph. The main contribution is the following 4 metrics which I am not sure is enough to justify a journal publication. In general, more in-depth research needs to be done on the metrics, their properties, their justification and the overall presented framework.

Presentation can also be improved: some sentences are a bit hard to parse and there are also a few typos. One is in the example for calculating the CI of 'Person' where according to figure instead of (0.02 + 0.01 + 0.06) it should be (0.02 + 0.1 + 0.06) (or the figure is wrong).

Certainly the types in exist in Google KG, but given that Google KG comes from Freebase which has 50k types I am not sure we can say that the ontology of GoogleKG is I am not sure it is public information what is the ontology used in GoogleKG.

page 3. 'in the work' -> you mean 'in the literature'

There is a big list of figures that don't seem to add much more than the number already given in the previous tables

Review #2
Anonymous submitted on 29/Jun/2023
Review Comment:

Structural Quality Metrics to Evaluate Knowledge Graph Quality


This article describes quality metrics which can be applied to Knowledge Graphs. In addition, the authors applied at-scale them by reviewing the quality of 6 major knowledge graphs, namely Wikidata, DBpedia, YAGO, Freebase, GoogleKG and Raftel. Based on these experiments, they provide an analyze of what should be done by the KG administrators to improve their KGs. Typically, they found the quality of a KG should not be limited to "scale-related indicators such as the number of classes and properties".

Structure and writing

- Article structure is easy to follow.
- Writing quality: ok. Even though some efforts could be made to make the flow easier.

Major comments

- The Introduction lacks positioning. It would have been appreciated if the authors presented a stronger story by motivating better their approach and the need of it. In addition, a quality-related example would have been nice.
- The Related Work section seems, to me, a bit short, in the sense that I feel like it misses some important related efforts. For instance, the Semantic Web works from Debattista aren't referenced [1], [2]. Same goes for the generic data quality effort in [3]. In the same idea, this section misses a detailed discussion between the various quality metrics already existing in the literature in order to better understand the gaps that the authors are filling.
- Section 4 lacks a discussion on the benefits to have these new quality metrics as compared to the previous ones existing in the literature. Similarly, it would have been interesting to have a stronger motivation leading to the need of such metrics, demonstrating for example that they cover necessary aspects, ignored until now. Finally, having an aggregated score/metric could have been a nice addition too (even though Table4 provides some aggregation).
- Section 5 is only a description of the respective scores for the six datasets used to show the metrics. A stronger discussion, leading to the suggestions of guidelines and actions to improve these dataset respective qualities, would have been very useful to have. In a sense, this would have let the reader understand the interest of such a new set of quality metrics in the context of designing/building a Knowledge Graph.

[1] Luzzu—A Methodology and Framework for Linked Data Quality Assessment (Debattista et al.)
[2] Evaluating the Quality of the LOD Cloud: An Empirical Investigation (Debattista et al.)
[3] Requirements for data quality metrics (Heinrich et al.)

Minor comments

- It seems that the article is Korean focused. I do not really understand the motivation behind this restriction.
- I do not see the need for Section 3, what are the findings and how are they used to justify the need of 'structural quality metrics' as the next Section?
- On page 5 line 37, "4.2.2 have been examined in previous studies" needs references then.
- On page 7 line 44, typo. "memeber" → "member"
- Having both Fig.2 and Table.4 seems unnecessary.
- I do not really understand the relevance of the Appendix section…

Overall [REJECT]

This article presents 6 quality metrics to be used in order to evaluate Knowledge Graphs. In addition, the authors present experiments related to 6 large Knowledge Graphs among which 5 are very popular.
However, to me, neither they motivated their approach enough nor they used the experimental results in order to draw conclusions on how to improve the reviewed Knowledge Graphs. In addition, I think the overall positioning of the article should be reviewed so to better highlight the need for the community of such new metrics. Finally, I found the article not fully related to Wikidata (when the special issue is "Wikidata 2022"), indeed, the authors are only considering Wikidata within their KG set for their experiments but are not putting Wikidata at the center of their efforts.

For these reasons, I do not think this article fits within the scope of this Semantic Web Journal issue.