Review Comment:
Many thanks to the authors for submitting in response to the special issue call for papers on 'Linked Data and Ontology Reuse'. The article proposes a topic modeling approach to reveal latent topics of ontologies in order to compare them wrt. their similarity, dissimilarity or complementarity for a given domain.
In the introduction the authors start by giving the general motivation for their work. In particular, they suggest that this approach will fill the gap in ontology reuse that exists at the stage of finding appropriate ontologies modeling a domain of interest in ontology repositories. It is worth mentioning that the authors specifically highlight the importance of ontology reuse in Linked Data publication, as it is commonly agreed that the reuse of identifiers generally decreases the effort needed to integrate data from different sources. A brief introduction into LDA and BTM, as two relevant topic modeling techniques is given then, before the section closes with a summay of the contributions of this work. The authors state that there contributions are: (1) modeling topics from ontologies' lexical information enriched by word-sense disambiguation; (2) the evaluation of the approach by (a) a comparative study of using two different topic modeling techniques (LDA and BTM) for the topic modeling task and (b) a comparison of the results of the approach with an LDA topic model learned from the lexical information without word-sense enrichment; (3) the evaluation of the applied topic models by comparing their performance for clustering ontologies from a real-world corpus of topically annotated ontologies.
In terms of related work the authors contract their approach with ontology reuse in general, ontology similarity, and topic models. The most obvious issue in that regard is that there is space for improving the completeness of the literature review on ontology reuse starting from work that was fundamental to this but not yet necessarily related to ontologies in the context of Linked Data [1,2,6]. Further to that stands a fairly broad body of literature on investigations of ontology development and reuse in the context of Linked Data, which seems to be left out entirely [3,4,5,7]. The second paragraph of section 2.1 mixes the Linked Data motivation given in the beginning - reuse of concept identifiers for data population - with the redundancy of concepts in different ontologies. The references listed in this paragraph do suit to motivate the authors' approach because they provide evidence that there are multiple ontologies available for the same domain (this could complement the Linked Data argument in the introduction). However, the authors forget to link these references in terms of the differences in their potential to be used for the same task as the one proposed in the paper. One can also raise the question of how the proposed method relates to what has been introduced as ontology summarization [10] as well as the very well-known work on ontology mapping [8,9]. Altogether, the section on related work leaves the impression that the authors mix up foundations/preliminaries - the entire section on topic models should come under that headline, since the contribution here is hardly in the are of topic modeling - with related work and that the survey of related work is incomplete and not sufficiently contracted with the authors' own work.
The description of the method itself is clear. The notation is introduced appropriately and it is worth highlighting that the authors made all source code and research data available via github. In terms of the extraction of lexical information of an ontology it is not entirely clear why URIs are extracted, since these seem to introduce noise to the document corpus. It is interesting that in section 3.3 the authors provide a reference as evidence for using BTM rather than LDA for short and noisy text but then make the comparison between these two approaches still the key element of their own evaluation in section 5.
The introduction of the approach is followed by an example of the application of the approach to ontologies from the library domain. The example is fictive in the sense that no real users are involved in it; it is simply about applying the proposed method to a dedicated corpus of related ontologies. The findings from this small case study are summarised qualitatively. It is not entirely clear what this section is meant to contribute to the article. Is it part of the evaluation? Is it a validation that the approach works? The authors simply do not say why they present this particular example here. One can imagine that it suits well as a running example throughout the paper but then it should allow to retrace the benefit of the novel approach over the situation without it. Nothing is said about any evidence that the actual users of ontology repositories perceive any significant problems in finding suitable ontologies. One might argue that such studies do not exist. However, similar ones were conducted in the past [11] and these either suit better to back up the argumentation for a running case or they can be a blueprint to repeat such a study to gain that evidence.
The evaluation is targeted at 'strength and weaknesses' of the approach and comprises several experiments involving different topic modeling techniques and ontology contexts (enriched vs. raw). It is interesting to see that the authors highlight that none of the methods used to evaluate approach requires external input (see section 5.1. 'advantage of not requiring an external evaluation corpus' and section 5.2. 'our goal is to evaluate the performance [...] without any intervention'). One can guess here that the authors wanted to get away without the need to involve any experts or users in the evaluation. This is a critical problem, which was already metnioned before wrt. the 'illustrative example'. The article talks a lot about the problem of _people_ not being able to find the appropriate ontologies but then excludes them from the consideration of the 'quality' of the derived topics. In terms of the qualitative results presented in the evaluation the authors refer to a lower I(G) as the better result. But then Table 2 would show that LDA performs better than LDAs consistently. This is raises the question whether the performance benefit of BTMs stems from the benefit of BTM in general and not the fact that an enriched context is used.
Altogether, this article clearly shows that the authors work on a promising topic and contribute a novel method to summarise ontologies. The presentation is clear and the authors provide all material to allow for a reproduction of the results. The issue is that the work seems to be in an early stage and does not contribute significantly to the field of 'Linked Data and Ontology Reuse' as it stands. The embedding with the literature in this area needs to be strengthened and the discussion should come back to this aspect as well. The authors themselves mention that further experiments are needed to provide evidence for what impacts LDAs to perform better starting from K=20. The question is whether the enriched context is the key here. Additional uncertainty to the authors' conclusions is added by Table 2, which seems to conflict with the topic coherence results.
A final but major point to mention is also that bot,h the example case as well as the evaluation, do not regard any users. To make the paper an adequate fit to the special issue, it seems necessary to situate the proposed method within the data/ontology lifecycle and assess (or refer to) the current issues people really have and the benefits they gain from the proposed approach. This automatically brings in alternative methods to overcome those issues and asks for a comparison of the topic modeling approach to those other methods.
I would propose a 'reject and resubmit' decision, since the necessary revisions seem to exceed what one would normally regard as 'major revision'. The user dimension is an additional piece of research to be done and the same holds for the experiments needed to shed more detailed light on the comparison between LDA and LDAs.
[1] Simperl, E. (2009). Reusing ontologies on the Semantic Web: A feasibility study. Data & Knowledge Engineering, 68(10), 905-925.
[2] Simperl, E., Sarasua, C., Ungrangsi, R., & Bürger, T. (2011). Ontology metadata for ontology reuse. International Journal of Metadata, Semantics and Ontologies, 6(2), 126-145.
[3] Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., & Hogan, A. (2013). Observing linked data dynamics. In The Semantic Web: Semantics and Big Data (pp. 213-227). Springer Berlin Heidelberg.
[4] Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., & Decker, S. (2012). An empirical survey of linked data conformance. Web Semantics: Science, Services and Agents on the World Wide Web, 14, 14-44.
[5] Rula, A., Palmonari, M., Harth, A., Stadtmüller, S., & Maurino, A. (2012). On the diversity and availability of temporal information in linked open data. In The Semantic Web–ISWC 2012 (pp. 492-507). Springer Berlin Heidelberg.
[6] d’Aquin, M., & Noy, N. F. (2012). Where to publish and find ontologies? A survey of ontology libraries. Web Semantics: Science, Services and Agents on the World Wide Web, 11, 96-111.
[7] Luczak-Rösch, M., Simperl, E., Stadtmüller, S., & Käfer, T. (2014). The Role of Ontology Engineering in Linked Data Publishing and Management: An Empirical Study. International Journal on Semantic Web and Information Systems (IJSWIS), 10(3), 74-91.
[8] Noy, N. F., & Musen, M. A. (2000, August). Algorithm and tool for automated ontology merging and alignment. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-00). Available as SMI technical report SMI-2000-0831.
[9] Noy, N. F., & Musen, M. A. (2003). The PROMPT suite: interactive tools for ontology merging and mapping. International Journal of Human-Computer Studies, 59(6), 983-1024.
[10] Li, N., & Motta, E. (2010). Evaluations of user-driven ontology summarization. In Knowledge Engineering and Management by the Masses (pp. 544-553). Springer Berlin Heidelberg.
[11] Simperl, E. P. B., & Tempich, C. (2006). Ontology engineering: a reality check. In On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE (pp. 836-854). Springer Berlin Heidelberg.
Chicago
|