A Quality Assessment Approach for Evolving Knowledge Bases

Tracking #: 1795-3008

Authors: 
Rifat Rashid
Marco Torchiano
Giuseppe Rizzo
Nandana Mihindukulasooriya
Oscar Corcho

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Full Paper
Abstract: 
Knowledge bases are nowadays essential components for any task that requires automation with some degrees of intelligence.Assessing the quality of a Knowledge Base (KB) is a complex task as it often means measuring the quality of structured information, ontologies and vocabularies, and queryable endpoints. Popular knowledge bases such as DBpedia, YAGO2, and Wikidata have chosen the RDF data model to represent their data due to its capabilities for semantically rich knowledge representation.Despite its advantages, there are challenges in using RDF data model, for example, data quality assessment and validation.In this paper, we present a novel knowledge base quality assessment approach that relies on evolution analysis. The proposed approach uses data profiling on consecutive knowledge base releases to compute quality measures that allow detecting quality issues. In particular, we propose four quality characteristics: Persistency, Historical Persistency, Consistency, and Completeness.Persistency and historical persistency measures concern the degree of changes and lifespan of any entity type. Consistency and completeness measures identify properties with incomplete information and contradictory facts. The approach has been assessed both quantitatively and qualitatively on a series of releases from two knowledge bases, eleven releases of DBpedia and eight releases of 3cixty. In particular, a prototype tool has been implemented using the R statistical platform. The capability of Persistence and Consistency characteristics to detect quality issues varies significantly between the two case studies. Persistencymeasure gives observational results for evolving KBs. It is highly effective in case of KB with periodic updates such as 3cixtyKB. The Completeness characteristic is extremely effective and was able to achieve 95% precision in error detection for both use cases. The measures are based on simple statistical operations that make the solution both flexible and scalable.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Jan/2018
Suggestion:
Accept
Review Comment:

Accept as it is. The authors nicely handled my comments and the comments of the other reviewers.

Review #2
By Ruben Taelman submitted on 23/Jan/2018
Suggestion:
Minor Revision
Review Comment:

I appreciate the work done by the authors to address the comments from the reviews.
On many levels, the work has improved,
but in its current state, I do not consider the paper being ready for acceptance yet.

I acknowledge the effort of the authors to clarify that their work focuses on high-level changes,
and not low-level changes. The motivations are sufficiently clear to me.
However, the first mention of this scope is only mentioned on page 4,
which is quite late, as some readers might have already come up with some assumptions about the work.
I would suggest the authors to include at least a small hint towards this focus in the abstract and/or the introduction.

The conclusions from the qualitative analysis are improved by adding another class and the manual validation.
However, the new part in 6.3 that explains manual validation is written poorly
with many typo's and is difficult to follow.
I recommend a rewrite of this part.

As suggested in the previous reviews, the authors added a guideline for determining the consistency metric threshold.
The explantation is however very fuzzy. It is mentioned that a "heuristic approach" was applied to derive this value based on low relative frequencies.
These frequencies are determined by counting the properties with low instance counts for all KB releases.
It is not clear how these properties are counted, i.e., when a property has a "low instance count".
This seems like another case where some kind of threshold value is needed.
So to me, it seems like the choice of this threshold was moved to a different place.

The definitions of Persistency and Historical Persistency are clear now.

The information growth assumption has now been motivated sufficiently.

I am not sure if I understand the Schema Profiler that was introduced in this revision.
As far as I understand, this phase makes it so that sequences of KBs only contain classes and properties present in all releases.
After that, this "filtered" sequence is passed to the statistical profiler and quality profiler.
If this is correct, then I assume measurements such as "number of distinct predicates" to remain the same over the complete sequence.
In either case, I suggest clarifying the goal of this schema profiler.

I assume that the authors forgot to push their changes to GitHub.
The authors mention that an example report was added, and that the DBpedia release data structure was made consistent with the text.
However, the GitHub contents seem unchanged. The last commit was 10 months ago at the time of writing.

Many of the newly added parts contain a lot of typo's and are not always written very clearly.
I recommend a thorough proof-read of these parts.

Minor comments: (the following page numbers are based on the document where changes were annotated in blue)

Page 4:
"More specifically, the representation of changes at the low-level leads to a syntactic delta, which does not properly give insights to KB evolution to a human user. On the other hand, high-level changes can capture the changes that indicate an abnormal situation and..."

This is an unfair comparison between low- and high-level changes.
Low-level changes do not necessarily lead to syntactic deltas.
SemVersion [1], for instance, allows you to perform semantic deltas between two versions.
These deltas would then give proper insights for humans, possibly even more so than with high-level analysis.

Page 6:
"check chosen entity type" -> "check the chosen entity type"

Page 16:
"schema profiler uses availability of data instance" -> "schema profiler uses the availability of data instances"
"is a derived measures" -> "... measure"

Page 25:
"Persistency and historical persistency we only investigate subset of dataset to detect the causes of quality issues for a entity type"

Page 12:
Original comment: "9. Each metric function has conflicting names. Persistency_i(C) and Persistency_i for instance should be called differently, PersistencyClass_i(C) and PersistencyKb_i for example."
The function names for class and KB-based persistency are still formally the same (Persistency_i(...)).
The names of these functions should be changed, because they can not do different things based on the input type (class of KB) that is provided to them.

Page 18:
Original comment: "Table 3, why not consistenly use % or [0, 1]?"
My original comment remains.
I assume that the authors mean with "[0, 1]" that only the values 0 or 1 are allowed.
This is however confusing, as this can also be interpreted mathematically as the interval of all numbers between 0 and 1.
I suggest changing this notation.

Page 22:
Original comment: "13. For DBpedia, the completeness measure was only applied to properties with Consistency = 1, why?"
My question remains. The text mentiones that the completeness measure was only applied to properties with Consistency = 1,
but it does not mention _why_ this choice was made.

[1] Völkel, Max, and Tudor Groza. "SemVersion: An RDF-based ontology versioning system." Proceedings of the IADIS international conference WWW/Internet. Vol. 2006. 2006.

Review #3
Anonymous submitted on 02/Mar/2018
Suggestion:
Minor Revision
Review Comment:

The revised manuscript has improved significantly.

There are a couple of minor presentation/organization issues as a result of the newly added contents, namely:
- redundancies, where the same information/justification is repeated in different places (e.g. the reference to Papavasileiou et al. in the Schema Profiler description and the explanation of why only classes/properties that are present in all KB releases can be considered)
- information pertinent to the experimental evaluation permeates descriptions of the approach (e.g. in the Schema Profiler description, details about the classes of the two KBs that have been selected during the experimental assessment are given)
It is also not clear why for the Consistency characteristic, only the last release of 3cixty Nice was considered, while for the same metric, the last two releases of DBpedia were taken into account.

Overall, the added-value and usefulness of the proposed evolution-based coarse-grained analysis as means for flagging potential issues and serving as guidance for further
fine-grained, manual or not, inspections, is well-motivated and fairly explicit. Moreover, the authors have extended the qualitative, manual assessment of the proposed evolution-based metrics and
acknowledge pertinent limitations. The extended assessment results are quite interesting; given the overwhelming, resource- and time-wise, task of performing a full-scale manual assessment,
it would be indeed very interesting to discuss possibilities for alternative validation methods.

Last, given that the experimental assessment considered two KBs and in a partial only fashion, generalizations, such as "We observe that continuously changing KBs with high-frequency updates (daily updates) such as 3cixty Nice KB tends to remain stable in case of the consistency issue.
On the other hand, KB with low-frequency updates (monthly or yearly updates) such as DBpedia KB tends to have inconsistency.", should be drawn cautiously.

There are still several minor typos and presentation issues (not an exhaustive list):

*1. Introduction*
"it can detect which triple have been deleted" -> "it can detect which triples have been deleted"

"We can thus detect changes that indicate an issue in data extraction or integration phase of a KB analyzing the history of changes, in other words analyzing the
KB evolution.": revise to avoid redundancy (we can detect changes by analyzing the history of changes, ...)

RQ1 "We propose temporal measures" -> "We propose evolution-based measures"

"Furthermore, experimental analysis based on quantitative and qualitative approaches." incomplete sentence

"We performed an experimental analysis to validate our measures on two different KBs namely, 3cixty Nice KB [9] and DBpedia KB [2].": repeated in the third bullet that follows

"motivational examples that demonstrates" -> "motivational examples that demonstrate"

"aspects of our quality assessment approach based coarse grain analysis." -> incomplete sentence

"Section 4 contains definition of proposed temporal based" -> "Section 4 contains the definition of the proposed evolution-based"

*2. Background and Motivations*

"their schema usually evolve" -> "their schema usually evolves" OR "their schemata usually evolve"

"impact of the unwanted removal of resources" -> "impact of erroneous removal of resources"

"In general, Low-level change" -> "In general, low-level change"

"We track the Wikipedia page from which statement was extracted in DBpedia." -> "We track the Wikipedia page from which DBpedia statements were extracted." ?

"This instances are" -> "These instances are"

"Such as, considering schema of a KB remains unchanged a set of low-level changes from data corresponds to one high-level change." -> "For example, assuming that the schema of a KB remains unchanged, ...."

"Data quality issues, are the specific problem instances that we can find issues based on quality characteristics and ..." -> please revise

"initiates a quality assessment procedure, it selects" -> "initiates a quality assessment procedure, she/she needs to select"

"to check chosen entity type present" -> "to ensure that the selected entity type is present" OR "to check "

*4.2.2*
"as the degree to which unexpected removal" -> "as the degree to which erroneous removal"

*4.2.4*
"if it does not contain conflicting or contradictory fact." -> "if it does not contain conflicting or contradictory facts."

"consistency of RDF statement using SDValidate approach" -> "consistency of RDF statements using the SDValidate approach"

5. Evolution-based Quality Assessment Approach

"based on the qaulity asessment" -> "based on the quality assessment"

"created a data extraction module that extend Loupe" -> "created a data extraction module that extends Loupe"

*Schema Profiler*
"present in all the analyzed KB" -> "present in all of the analyzed KB"

"In particular, schema profiler" -> "In particular, the schema profiler"

"Furthermore, we checked schema consistency based on any data present for the property. " -> revise sentence please

*Quality Profiler*
"More in detail" -> "Elaborating further", "More specifically", etc.

*6.1*
"as schema in 3cixty" -> "as the schema in 3cixty"

*6.2*
"based on proposed quality characteristics" -> "based on the proposed quality characteristics"

"In particular, we analyzed selected classes" -> "In particular, we analyzed the aforementioned selected classes"

In the Discussion part (in 6.2 and following subsections), replace "In case of {KB_name}" with "In the case of {KB_name}".

Figure 10: the ".csv" file extension shown as part of the class name in the legend on the right, could be be removed (i.e. dbo-work or dbo:Work instead of dbo-work.csv)

*6.2.3*
"In this experiment, we used last three releases" -> "In this experiment, we used the last three releases"

"From the three different distributions value of 0.05 has lower number of instances where value of 0.20 has increasing number of instances." -> please revise

"Table 8 reports, for the DBpedia ten class," -> Table 8 reports, for the ten DBpedia classes,"

*6.3*

Table 11: please revise English in the descriptions in the "Causes of quality issues" column

*Consistency*
"This indicates an error presents due to the wrong Wikipedia infobox extraction." -> please revise sentence
*Completeness*
"Then we performs manual inspection" -> "Then we perform manual inspection"

*7. Discussion*

"In fact, among the four proposed quality characteristics we have proposed two characteristics – completeness and consistency– from the ISO 25012
standard. " -> missing verb?

"Such as we found significant no. of resources missing in" -> ", such as a significant number of resources missing" OR "For example, we found a significant number of resources..."

"we didn’t found any real" -> "we didn’t find any real"

*8. Conclusion and Future Work*

"characteristics form the ISO 25012" -> "characteristics from the ISO 25012"

"such as 3cixty Nice KB tends to" -> "such as 3cixty Nice KB tend to"

"On the other hand, KB with low-frequency updates (monthly or yearly updates) such as DBpedia KB tends to have inconsistency." ->
On the other hand, KBs with low-frequency updates (monthly or yearly updates), such as the DBpedia KB, tend to have inconsistencies.