Review Comment:
The paper introduces an approach which, given existing language editions of DBpedia, automatically creates DBpedia chapters for new languages by automatically computing infobox mappings, and applying statistical methods to complete the typing information in those chapters.
On the positive side, the paper shows very mature work with good empirical results. Evaluations for all of the steps are carried out with care, and the resulting datasets are published, hence making a contribution for the Linked Open Data community.
The related work is semi-complete, I miss, e.g., a reference to [1], which seems to be rather close. Furthermore, the related work section is merely an enumeration of approaches, but I miss some sentences defining the difference and novelty of the works presented in this paper w.r.t. the state of the art presented.
My main concern with the paper is a lack of self-containedness. I understand that this paper is a distilled report of quite a few previous research efforts, which in principle is fine. However, in certain places, it is impossible to understand the paper without looking into the referred papers. In those places, more details should be added, so that the paper can be consumed as a self-contained piece of information. Furthermore, the paper lacks clarity in different places. I will detail them below.
The main section describing the approach (3-5) miss an introduction briefly describing the "big picture" of how the different pieces fit together. A suggestion for fixing this is moving section 6 before 3, and adding a high level picture.
Section 3 should add a more detailed discussion of the performance differences on the different language editions. The authors talk about "heterogeneous structure of infoboxes", but some more details would be appreciated. In particular, the algorithms are described rather briefly and vaguely.
In the evaluation in section 3, it is not clear whether micro or macro average of precision and recall are used. In the example given in Fig. 3, recall and precision would both be 2/3, given that owl:Thing is not included. If another infobox which relates to an entity on the same level as Agent is mapped correctly, the corresponding recall and precision would be 1. Is the final score based on macro (i.e., (2/(3+1))/2=5/6) or micro average (i.e., (2+1)/(3+1)=3/4)? I feel like the former would be more appropriate to level out effects caused by a heterogenous depth of the DBpedia ontology in different branches, but in any case, it should be clearly stated whether the evaluation based on micro or macro averages.
The evaluation section 4.2 left me a bit puzzled. It is stated that a human-created gold standard is used for the evaluation. Then, Fig. 4 also reports the quality of human annotations. If it is the human-created gold standard evaluated against itself, the F1-measure should be perfect. If not, it is not clear what "Human" refers to.
Section 5.2 contains an evaluation diagram depicting different variants (bottom-up vs. hybrid), but they are not appropriately introduced in the text. Here, a bit more detail would also be appreciated.
Finally, for the evaluation in section 7, I would have appreciated some statements about the runtime as well to discuss the scalability of the approach.
While these are quite a few points of critique, I believe that it should be fairly easy for the authors to address each of those points. Thus, in my opinion, a minor revision of the paper is sufficient.
Minor:
* Not sure whether this is an issue with my printer driver, but a few special characters did not appear correctly in my print
* p.1: stating that Wikipedia is the "best digital approximation of encyclopedic human knowledge" is a bold statement. Depending on the purpose and information need, others could be better. It would not harm to tone down this statement.
* p.2: To readers infamiliar with DBpedia, it might not be clear what a "chapter" is in this context - a definition should be added.
* p.3. The term "pivot language" should be defined.
[1] Volha Bryl, Christian Bizer. Learning Conflict Resolution Strategies for Cross-Language Wikipedia Data Fusion. In Proceedings of the 4th Joint WICOW/AIRWeb Workshop on Web Quality Workshop (WebQuality) @ WWW 2014
|