Review Comment:
This paper proposes a quality framework for RDF graph summarization. It measures both the schema-level and the instance-level coverage of an RDF dataset achieved by RDF graph summarization approaches. The connectivity of a summary is also considered. The framework is used to evaluate three existing approaches based on a number of real-world RDF datasets.
The paper addresses an important research problem in the Semantic Web area. There have been many and various approaches to dataset summarization, but there is a lack of widely accepted evaluation criteria or an extensive empirical evaluation. This paper has the potential to meet the challenge, though its current form requires major revision.
I have two concerns about the proposed framework.
1. All the evaluation criteria are defined over knowledge patterns. The authors claim that their framework can be used to evaluate any RDF summarization algorithm. How could you prove that all such algorithms can be appropriately transformed into knowledge patterns? In particular, some approaches not mentioned in the related work section define a summary of an RDF graph as a subgraph extracted from it, rather than in an aggregate schema-like form, say "Structural Properties as Proxy for Semantic Relevance in RDF Graph Sampling" (ISWC '14) and "Generating Illustrative Snippets for Open Data on the Web" (WSDM '17). I'm not sure if it is possible and appropriate to regard such summaries as knowledge patterns.
2. Although it is claimed that the framework presents a *comprehensive* way to measure the quality of RDF summaries, all the evaluation criteria are essentially based on the same principle, i.e., a good summary *accurately* characterizes the original data. This has been thoroughly discussed in [11] and [28], in which various accuracy metrics have been proposed and used. What is the difference between those metrics and the ones proposed in this paper? In addition, apart from accuracy, some other factors also influence the quality of a summary, such as conciseness and comprehensibility, which are not addressed in this *comprehensive* framework.
Some details in the framework and the experiments are to be clarified.
3. In Equation (5), how is Nps computed? What do you mean by *represent* a class?
4. Prior to Equation (10), it is true that the algorithms do not invent new properties, but isn't it possible that an algorithm chooses a property that is not included in the ideal summary? Do you assume that the ideal summary covers all the properties? Is this assumption reasonable? Why don't you make a similar assumption on classes?
5. In Section 6.2, what is an untyped dataset? By removing all the entity and property types, how do your evaluation criteria work? They are exactly based on classes and properties.
6. In Section 6.2.1, how are the ideal summaries generated? Is it possible that different experts would generate different ideal summaries?
7. I applaud that considerably many datasets are used in the experiments. However, they are not as heterogeneous as claimed. In particular, DBpedia is not considered, which uses much more classes and properties than any other dataset used in the experiments. If for some reason DBpedia could not be tested, a discussion would be appreciated.
The writing should be significantly improved. Minor issues include but are not limited to the following:
- Abstract: optimizstion --> optimization
- Page 1: build and described --> built and described
- Page 2: RDG --> RDF
- Page 2: PDF --> RDF
- Page 3: RDF Schema describe --> RDF Schema describes
- Page 4: There is a question mark in a \cite environment.
- Page 6: In Equation (4), an unpaired bracket should be removed.
- Page 14: This is also explains --> This also explains
- Page 15: This is explain --> This explains
|