Review Comment:
Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic:
This work is very well suited as an introductory text for someone wanting to have a general picture of this topic, and someone who wants to dive deeper into this field.
How comprehensive and how balanced is the presentation and coverage:
The presentation and coverage is quite good and balanced, but there is still work that needs to be done with respect to clarity and evaluation.
Readability and clarity of the presentation:
The paper is well written, but there are parts that need to be further clarified and given attention to.
Importance of the covered material to the broader Semantic Web community:
This work is important for Semantic Web researchers that want to have a quick overview of the field.
DISCLAIMER: I am a co-author of a paper [7] cited in this work.
In a nutshell:
This paper discusses the latest works on semantic table interpretation, the process of relating (a set of) Web tables to a given target Knowledge Base.
This is a work that was missing from the literature and, when improved, it will be a very good introduction to researchers who want to dive into this field.
It is well written and covers an adequate number of recent works in the field. The topic is important for the Semantic Web community, as it is on making the
Semantic Web richer, by incorporating knowledge given by raw, yet structured, text in the form of Web tables.
Therefore, I recommend accepting this work, but only after doing a lot of work to improve it in several aspects, as detailed below.
In more details:
Major comments:
- There is a recent work that should be mentioned and compared against: Ritze, Lehmberg, Oulabi, Bizer. Profiling the Potential of Web Tables for Augmenting Cross-domain KBs. WWW '16.
- Background: There should be at least a small discussion on the objective function Q. Definition 2.5 is too generic and the reader ends with this function considered as a black box. This is the core of the problem and it deserves further analysis.
- Figure 1 is supposed to represent a single ontology. However, it seems unrealistic for the same ontology to consider the same concept (China) to be both a Country and a City. Usually, this happens with two different concepts that may have the same name, so two vertices having the same name would be expected. Please clarify, as I find this Figure confusing and it is central in the paper.
- Intro of Section 4 (Table Annotation tasks): The first paragraph of this section is not clear at all. Furthermore, it took me 2-3 passes on this paragraph alone to understand how you define the difference between an annotation task and the problem of STI. After those passes, I think that the problem is here and in the first paragraph of the paper, where you define STI in a very vague way. Please clarify both parts. For example, you should at least state here how is this set of table entities selected in an annotation task?
- Section 6 (FactBase Lookup): FactBase Lookup is only one of the three methods that we presented in [7]. The others are using embeddings, ontology matching, and the best one (called Ensemble in [7]) is a hybrid of the FactBase lookup and the embeddings method. You don't have to put this clarification in the paper, I just wanted to state this for completeness, in case it was not 100% clear. In the description of our work in your paper, I could not understand in which of those methods you were referring; I think you are describing more than one of those approaches as a single approach.
- Table 2: Relation Annotation, a feature of this table, is not defined or discussed anywhere. If my understanding is correct, FactBase lookup needs a check there, as it detects relations in Web tables and associates them with the corresponding ones from the target ontology. Perhaps T2K should be ticked also. However, without a definition for this column, I cannot be 100% sure what you mean and which works belong there. I think you should add a new subsection (4.5) and discuss this problem.
- Last paragraph of Section 6.3: "By observing the table, it is apparent that search based approaches more concise in overall capabilities in comparison with the alternate approaches."
This is a VERY important statements that people reading this work are expecting to read and be sure about it. They want your expertise on the field to help them decide which way to go.
However, this is not at all justified. Nothing is "apparent" by observing the table, as you may think, for the reader who has just started reading on this field. You need to properly justify this statement with facts, even if it may seem apparent to you after spending so much time reading the bibliography.
- Evaluation: This is a weak part of the paper, in my opinion. It's good that you introduce the metrics used to evaluate those works and that you properly explain them, but could you do nothing more to include more works, even for some of the gold standards, if not all of them? I know that it's really hard getting results for all those tools, but I also know that may times you can get those results by asking the help of their authors.
- Conclusions and Future Work: I think you should extend this very useful discussion that you have started (it's good but can be better) with more insights and key observations and some first ideas on the future work you suggest. I believe that there is no page limit at the moment, but perhaps I am wrong.
Minor comments and typos (in order of appearance):
- Abstract: "This paper presents a survey on Semantic Table Interpretation(STI). Goal of this paper is to provide an overview of STI algorithms, data-sets used, and their evaluation strategies and critically evaluate prior approaches" --> "This paper presents a survey on STI, aiming to provide an overview of STI algorithms, data-sets used, and their evaluation strategies and critically evaluate prior approaches".
- Abstract: "and point out their strengths and weakness" --> "(...) weaknesses"
- Abstract: "Also, We present" --> "Also, we present"
- The second paragraph of the intro is not at all clear; please re-write from scratch.
- Intro, par. 3: "Without the loss of generality" --> "Without loss of generality"
- Equation (1): I would expect M to be one of Q's parameters (also that is what I expected after reading Definition 2.5). If not, please justify.
- Definitions 2.1, 2.2 are not properly discussed. In a survey paper, I would expect an extensive discussion about other types of tables (non-relational, e.g., vertical) and which ones are easier to interpret and why. Also, what if there is no header row?
- Definition 2.3: "Ontology O is" --> "An Ontology O is"
- Definition 2.4: Missing a '.' at the end.
- After Definition 2.5: How is |O| defined? Is this equal to |C|?
- After Equation (3) (and everywhere): "according to the definition 2.6" --> "according to Definition 2.6" (lose "the" and use capital first letter when referring to a specific definition/equation/figure).
- Last paragraph of Section 2: "In literature these sub-problems are referred to as Annotation Tasks" add reference(s) to this literature
- Section 3 (occurring again elsewhere): "ideology" --> "approach"
- Section 3: "Some literature[6, 8, 14] strongly suggests the existence of such relationships." What is the meaning/purpose of this sentence?
- Last sentence of Section 3: "We call out for a future work that investigates the possible trade offs between these two choices." Could you state a few more words about that?
- Section 7.2: " Efthymiou et al.[7] contributed to the process by converting both T2D and Limaye data sets to JSON format." This is not an actual contribution, but I am glad that you found it useful. Instead, what I consider an actual contribution, as stated in our paper [7], is the new gold standard that we created from Wikipedia tables and offer it publicly [7]. Adding it to your experiments (we also have the results for T2K) might enrich your evaluation results, but I don't suggest that it is necessary.
- References: try to use a consistent format in the references. Also, references 40-46 are never mentioned; please remove them, or say something about them, if you want to keep them.
|