Understanding the Structure of Knowledge Graphs with ABSTAT Profiles

Tracking #: 3181-4395

Blerina Spahiu
Matteo Palmonari
Renzo Arturo Alva Principe
Anisa Rula

Responsible editor: 
Guest Editors Interactive SW 2022

Submission type: 
Full Paper
While there has been a trend in the last decades for publishing large-scale and highly-interconnected Knowledge Graphs (KGs), their users often get overwhelmed by the daunting task of understanding their content as a result of their size and complexity. Data profiling approaches have been proposed to summarize large KGs into concise and meaningful representations, so that they can be better explored, processed, and managed. Profiles based on schema patterns represent each triple in a KG with its schema-level counterpart, thus covering the entire KG with profiles of considerable size. In this paper, we provide empirical evidence that profiles based on schema patterns, if explored with suitable mechanisms, can be useful to help users understand the content of big and complex KGs. We consider the ABSTAT framework, which provides concise pattern-based profiles and comes with faceted interfaces for profile exploration. Using this tool we present a user study based on query completion tasks, where we demonstrate that users who look at ABSTAT profiles formulate their queries better and faster than users browsing the ontology of the KGs, a pretty strong baseline considered that many KGs do not even come with a specific ontology that can be explored by the users. To the best of our knowledge, this is the first attempt to investigate the impact of profiling techniques on tasks related to a content understanding with a user study.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 30/Sep/2022
Review Comment:

In my review of the previous version of this paper, I asked for a number of changes:

0. page load times / responsiveness

-> This has been addressed in sec 2.3

1. addressing subclass / reasoner impact

-> This has been discussed for future work in sec 6

2. a revision of the discussion on understanding and extrinsic tasks

-> This has been addressed by removing the paragraph with suitable replacement

3. a review of the notation around compression.

-> this has been addressed

Additionally, my co-reviewers asked for more detail on an alternate tool which was reported as being similar, but not directly compared.

-> I think the authors have explained well the comparison, have discussed the justification for the experimental technique, and sharpened their claims.

Review #2
Anonymous submitted on 05/Oct/2022
Review Comment:

I thank the authors for addressing the comments. I have found that the article has improved in clarity.

I believe the paper is ready to be published, provided the authors would address some minor "issues" in the paper:

1) The authors adressed a comment and produced additional figures in the letter but have not included them in the paper. As the data has now been shared, one can look them up. However, why did the authors only provide figures for the average times w.r.t. SPARQL knowledge? They stated the reason was brevity in the letter, but there must be a reason to have chosen that one for the article. Why did the authors include Fig. 6 (motivation)? What was the point the authors wanted to convey?

2) The authors stated that Loupe uses a version of DBPedia that is not specified (Section 4.1), but Appendix A compares Loupe and ABSTAT using DBpedia. Could the authors clarify how the comparison in the Appendix was achieved?

3) The data should, ideally, be published on a suitable long-term preservation platform with metadata and a license (e.g., Zenodo). The current GitHub URL contains a typo.

4) Page 18, line 38, 1st column: Do you have a reference for non-native speakers treating queries differently? Or do you have a reference for native speakers "ignoring" the syntax of a question? If not, then clarify that this is an assumption.

As for the paper, I still spotted some minor issues that are easily addressed:

General comments
You have four occurrences of "she/he," why not use singular they or "the user"?
I had not spotted this the first time, but commas are used to separate every decimal place, and periods use decimal points. Some decimal points are indicated with commas, a European practice. Consider sticking to one system, preferably the UK/US system, given that the paper is written in English.
Some occurrences of "Web Protégé" instead of "WebProtégé."
Line 24: the word "daunting" can e removed.
Line 29: consider rephrasing: one considers "something" for/to do "something."
Section 1
Line 45: The use of "and so on" is informal, and you already have stated that a KG includes much more than those entities with "such as." Consider finishing the sentence with "events, and artworks." "Artworks" is just one word.
Page 2, line 29, 1st column: this whole paragraph is only one sentence! Given that some points are several lines long consider writing each item as a sentence. The use of sentences will further improve this paragraph's readability.
Page 2, line 31, 2nd column: wouldn't "knowledge graph understanding" be more appropriate? The sentence should end with "in this paper."
Page 3, line 51, 1st column: remove ", for example," from the sentence.
"Data understanding" and "content understanding" are mentioned in this section. The paper also mentions "knowledge graph understanding" elsewhere. Are they different things? If yes, please clarify. If not, ensure that the terminology used throughout the paper is homogenous.
Section 2
Page 4, line 17, 2nd column: the use of "very" is to be avoided as it is informal and subjective.
Page 4, lines 34 to 40, 2nd column: I appreciate that the authors have addressed this question. I would suggest turning this into a footnote.
Page 6, line 45, 1st column: "... that, for each result, a ..." (missing commas).
Page 7, line 50, 1st column: "In this section, ..." (missing comma).
Page 7, line 23, 2nd column: "... we compare ABSTAT with Loupe ..."
Page 9, line 50, 1st column: "... achieves a lower (i.e., better) reduction rate than Loupe."
Page 9, line 3, 2nd column: missing ")"
Section 3
"In this section, ..." (missing comma).
"graph understanding" is mentioned here. How is this different from data- and content understanding?
Page 10, line 6, 1st column: "In this paper, ..." (comma)
Section 4
Page 10, line 37, 2nd column: the use of "pushed us" is informal.
Page 11, line 33, 1st column: "in line" instead of "inline."
Page 11, line 13, 2nd column: "... would take ON average..."
Page 12, line 19, 1st column: the third item needs to be rephrased.
Page 12, line 39, 2nd column: "... the easiest one ..." sounds informal. Using "... the easiest query ..." would be better. There are other occurrences of "one" use in that manner.
Page 14, line 2, 1st column: "... than FOR Protégé users." and "..., users from the Protégé group generally required more time..." Most of these sentences in this paragraph need to be proofread.
Page 14, line 28, 1st column: "... to answer the es,..." What is "es"?
Page 16, line 28, 2nd column: How do you quantify "huge"?
Page 16, line 50, 2nd column: why is this obvious? Consider omitting the word "obviously."
Page 17, line 12, 1st column: "For THE Protégé group, users reported that IT was easier..."
Page 17, line 31, 1st column: "while references to specific types would be lost in the most general patterns."
Page 18, line 26, 1st column: "to summarize" instead of "to sum up"? Also, you have presented evidence that ABSTAT profiles provide such support.
Page 18, line 35, 2nd column: "at the time of writing."
Page 19, line 30, 1st column: use past tense instead of present tense.
Section 5
"In this section, ..." (missing comma)
"Differently from the above approaches" sounds quite artificial and is used multiple times. Consider rephrasing.

This list is by no means exhaustive.

Review #3
By Evan Patton submitted on 02/Dec/2022
Minor Revision
Review Comment:

Overall, I think the majority of issues I raised in the first review have been addressed. However, I do have a few items that come from the modified text that the authors may want to revisit:

Maybe instead of making an assertion such as "RDF-Schema, the simplest ontology language," on page 2, it might be sufficient to say something like "RDF-Schema, one ontology language, supports..." So that you don't need to make superlative assertions like "simplest" that need to be backed up.

The example query on lines 23-25 doesn't seem to need ontological knowledge. There may be a more appropriate query over DBpedia that could take advantage of the knowledge expressed in the ontology, but this is not one that seems to really justify the authors' investigation. It seems like the use of rdf:property and rdf:type would be all that is necessary (not even RDFS).

Regarding your comment on schema.org on line 33, Schema.org does publish the schema in Turtle and other machine readable formats: https://schema.org/docs/developers.html, unless "formal language" here is intended to mean first order logic or similar?

On page 3, lines 32-33, you may want to be clear upfront as to what "performance on the query completion" task actually means here. Are we considering the time it takes to complete the task? The number of mistakes made or distance from some gold-standard query? Something else?

For the updated text of section 2.3, there is insufficient numerical support for many of the statements, using phrases like "fetching and filtering ... is almost immediate and "response time is slightly higher but still almost instant." There is also a comment about network speed and "each gigabyte uploaded" but it's unclear whether these are compressed gigabyte versus uncompressed, and which format, as XML/RDF will have contain a different amount of information per GB than one GB of Turtle, for example, and compression will definitely affect how many uncompressed GBs could be transferred over the wire.

Minor things:

- Page 9, second column, line 3 there is a missing closing parenthesis.
- Page 10, second column, line 25 "for e thorough analysis" => "for a thorough analysis"