Topic Modeling for Linked Open Vocabularies

Tracking #: 1185-2397

Authors: 
Daniel Vila-Suero
Jorge Gracia
Asunción Gómez-Pérez

Responsible editor: 
Aidan Hogan

Submission type: 
Full Paper
Abstract: 
One of the major open issues in ontology reuse is how to help users to find the appropriate ontologies and terms for a certain application or domain of interest. In order to complement current ontology similarity-based techniques, topic modeling has the potential of allowing comparisons among ontologies not only on the basis of their lexical content, but also considering their latent semantic structure, by making emerge the topics that are implicit in their lexical descriptions. In this paper we propose a novel method that extracts the lexical contexts of a set of ontologies, annotates them with external senses connected to the Linked Open Data cloud, and uses these annotated contexts to train a probabilistic topic model. We evaluate the method both in terms of coherence of the extracted topics and in terms of its performance when clustering topic related ontologies.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Oct/2015
Suggestion:
Major Revision
Review Comment:

This paper studies topic modeling for ontologies/vocabularies. The proposed approach firstly extracts words from each ontology, then replaces words with senses, and finally uses LDA or BTM to train a topic model. The model can be used in vocabulary clustering.

The paper is well-written, easy to follow. In particular, the detailed example helps understand the approach. However, a major weakness of this work is that its technical contribution seems not significant. It is basically an application of existing topic modeling approaches on ontologies. Besides, in the evaluation, the proposed approach is not compared with any existing approaches, making it difficult to know whether this work advances the state of the art.

Detailed reviews:

In Section 2, the authors said “there is a lack of mechanisms to (semi-)automatically gather topical information of vocabularies”, which is not true because there has been work on ontology classification many years ago, like the following one.

Chintan Patel et al. OntoKhoj: A Semantic Web Portal for Ontology Searching, Ranking and Classification. WIDM’03.

In Section 3, a document is defined as a sequence of words, but it seems that the order of words is not important since a bag-of-words model is used.

In Section 3.2, Babelfy can only identifies senses of words containing at least a noun. Then, how to deal with many properties whose names are verbs, like foaf:knows?

In Section 5, in both experiments, LDA and LDA_S perform equally well. How could you conclude in Section 6 that “applying word-sense disambiguation … increases the quality of the topics”?

In Section 6, the authors made an overall observation that “probabilistic topic modeling can be effectively applied to represent ontologies as a mixture of latent features”. However, I didn’t see any evidence in the experiments that could support this claim. How do you define “effectively”? In the evaluation, three topic models are compared, but they are not compared with any other approaches.

So, in the next revision, the authors should compare their approach with at least one existing approach that measures ontology similarity/relatedness not based on topic modeling, e.g. those mentioned in Section 2.2. Accordingly, new evaluation metrics are needed because existing metrics can only be applied to topic models.

Minor issues:
Page 4, a latent topics —> a latent topic
Page 4, assume document are —> assume documents are
Page 4, be build —> be built
Page 7, we introduce in 5 (What is 5?)

Review #2
Anonymous submitted on 29/Oct/2015
Suggestion:
Major Revision
Review Comment:

This manuscript describes an application of word sense disambiguation and topic modeling for computing relatedness of vocabularies. The authors build a pipeline where literal elements of ontologies (eg concept labels etc) are annotated with word senses (using Babelfy) which are then used to model topic distributions using the Biterm Topic Model approach, which is known to work better (than for instance LDA) for short literals/text rather than extensive documents. The Jensen-Shannon divergence is then used for computing the similarity between vocabularies (and essentially for clustering) based on their topic distribution. The authors perform some form of evaluation by computing some metrics for the produced topics (coherence) and clusters (intra- and inter-cluster distance ration), compared to two baselines (LDA, LDA with annotated word senses).

Strong points:
- interesting approach
- the goal (ontology/vocabulary recommendation) is relevant and timely

Weak points:
- contribution: the authors seem to merely plug together existing techniques (Babelfy, BTM). In this sense, the paper presents a processing pipeline (i.e. an application) rathern than a significant research contribution.
- evaluation: assessing the topic coherence and cluster distances does not provide any conclusive insights into what part of the approach works and whether the produced topics/clusters actually are more correct or not
- the used data seems not accessible (computed topic distributions, clusters)
- task/claims: computing vocabulary similarity is not the same as "vocabulary recommendation". For the latter, one doesn't necessarily have a "base" vocabulary for computing similarities and, more often than not, metrics about the population and use of the ontology (frequency of use etc) are of higher importance. Generally, the actual task of similarity computation/recommendation is not evaluated.

Regarding the task/use case: the paper does mention terms such as vocabulary recommendation, similarity computation and the like, however does never clearly define the actual task (nor would evaluate the performance in that task). If the goal is to compute similarity between vocabularies, please clearly define the task and use case early on in the paper and avoid any other terminology. In the introduction, it becomes apparent that you aim for computing similarities between ontologies. Then, it would be required that your evaluation measures performance of your approach for precisely this task.

At the moment, there is a considerable gap between that claim and the evaluation (Section 5). The evaluation merely computes metrics for the computed topics (topic coherence) and the computed clusters, without investigating to what extent (a) topics and clusters are actually supporting the task (i.e. are describing correctly the semantics of the vocabularies) and (b) the proposed method itself supports the task of similarity computation between vocabularies. For that kind of evaluation, some form of ground truth or human judgements would be required, which would allow to investigate how well the produced clusters reflect *actual* ontology similarity. Simply computing the inter-cluster and intra-cluster distances does not provide any such insight and does not allow to judge the usefulness of the approach. Topics could be coherent and clusters cohesive and yet be inaccurate at the same time (for instance due to some issue with disambiguation).

Section 3: the definitions are very useful but sometimes require more clarity. For instance, when introducing "elements", you are not clearly stating what an individual "element" is. A single term (i.e. a "word")? A somehow specific set of terms (eg from a particular label or description)?

Simply creating a BoW without any weighting seems somewhat simplistic. Shouldn't a concept labels be ranked different than a domain or range definition?

Section 4 is very useful as it provides a simple example of the approach. However, several question remain: you pick a few topics (eg "topic number eight"). Why did you pick this? How did you identify K=20? Figure 1 shows some clusters of vocabularies, but the choice of vocabularies seems very arbitrary. It shows only bibliographic vocabularies (or related ones) and the DBpedia ontology. As such, the figure does not enable any conclusion or even exemplary support for the claim that the approach works. This would have been possible when showing clusters from a set of random and diverse vocabularies where some kind of semantic coherence can be observed in each cluster. In the current figure 1, I can observe that some bibliographic vocabularies seem closer to some other bibliographic vocabularies than to DBpedia, but at the same time, some seemingly "more bibliographic" vocabularies (RDA:l, DC) seem even more distant than DBpedia. It's hard to use this figure as a demonstration of your approach.

Occassionally the wording seems unprecise. What exactly do you mean with "extracts .... latent topics and connect them to the LOD cloud" (in Section 1)? While English is fairly good throughout most parts, some specific sections, specifically Section 4.3 contain a vast amount of typos and malformed sentences.

Minor:
- Section 2: "those that make use OF aggregated..."
- Section 2: "documentS are drawn"
- Section 2: " :(i)"
- abstract: "external senses" seems an odd/ambiguous phrase
- WATSON: I would not describe Watson as an "ontology search engine"
- Introduce references/URLs/footnotes on first mention of the notion/item (eg BIBO)
- Section 4: "? are there.. " (are should start with upper case letter
- Section 4: "describe a specific approach" (not "an")
- Figure 1 caption: "among vocabularies the obtained..."
- Section 4: in general lots of typos and malformed statements

Review #3
By Markus Luczak-Roesch submitted on 23/Nov/2015
Suggestion:
Reject
Review Comment:

Many thanks to the authors for submitting in response to the special issue call for papers on 'Linked Data and Ontology Reuse'. The article proposes a topic modeling approach to reveal latent topics of ontologies in order to compare them wrt. their similarity, dissimilarity or complementarity for a given domain.

In the introduction the authors start by giving the general motivation for their work. In particular, they suggest that this approach will fill the gap in ontology reuse that exists at the stage of finding appropriate ontologies modeling a domain of interest in ontology repositories. It is worth mentioning that the authors specifically highlight the importance of ontology reuse in Linked Data publication, as it is commonly agreed that the reuse of identifiers generally decreases the effort needed to integrate data from different sources. A brief introduction into LDA and BTM, as two relevant topic modeling techniques is given then, before the section closes with a summay of the contributions of this work. The authors state that there contributions are: (1) modeling topics from ontologies' lexical information enriched by word-sense disambiguation; (2) the evaluation of the approach by (a) a comparative study of using two different topic modeling techniques (LDA and BTM) for the topic modeling task and (b) a comparison of the results of the approach with an LDA topic model learned from the lexical information without word-sense enrichment; (3) the evaluation of the applied topic models by comparing their performance for clustering ontologies from a real-world corpus of topically annotated ontologies.

In terms of related work the authors contract their approach with ontology reuse in general, ontology similarity, and topic models. The most obvious issue in that regard is that there is space for improving the completeness of the literature review on ontology reuse starting from work that was fundamental to this but not yet necessarily related to ontologies in the context of Linked Data [1,2,6]. Further to that stands a fairly broad body of literature on investigations of ontology development and reuse in the context of Linked Data, which seems to be left out entirely [3,4,5,7]. The second paragraph of section 2.1 mixes the Linked Data motivation given in the beginning - reuse of concept identifiers for data population - with the redundancy of concepts in different ontologies. The references listed in this paragraph do suit to motivate the authors' approach because they provide evidence that there are multiple ontologies available for the same domain (this could complement the Linked Data argument in the introduction). However, the authors forget to link these references in terms of the differences in their potential to be used for the same task as the one proposed in the paper. One can also raise the question of how the proposed method relates to what has been introduced as ontology summarization [10] as well as the very well-known work on ontology mapping [8,9]. Altogether, the section on related work leaves the impression that the authors mix up foundations/preliminaries - the entire section on topic models should come under that headline, since the contribution here is hardly in the are of topic modeling - with related work and that the survey of related work is incomplete and not sufficiently contracted with the authors' own work.

The description of the method itself is clear. The notation is introduced appropriately and it is worth highlighting that the authors made all source code and research data available via github. In terms of the extraction of lexical information of an ontology it is not entirely clear why URIs are extracted, since these seem to introduce noise to the document corpus. It is interesting that in section 3.3 the authors provide a reference as evidence for using BTM rather than LDA for short and noisy text but then make the comparison between these two approaches still the key element of their own evaluation in section 5.

The introduction of the approach is followed by an example of the application of the approach to ontologies from the library domain. The example is fictive in the sense that no real users are involved in it; it is simply about applying the proposed method to a dedicated corpus of related ontologies. The findings from this small case study are summarised qualitatively. It is not entirely clear what this section is meant to contribute to the article. Is it part of the evaluation? Is it a validation that the approach works? The authors simply do not say why they present this particular example here. One can imagine that it suits well as a running example throughout the paper but then it should allow to retrace the benefit of the novel approach over the situation without it. Nothing is said about any evidence that the actual users of ontology repositories perceive any significant problems in finding suitable ontologies. One might argue that such studies do not exist. However, similar ones were conducted in the past [11] and these either suit better to back up the argumentation for a running case or they can be a blueprint to repeat such a study to gain that evidence.

The evaluation is targeted at 'strength and weaknesses' of the approach and comprises several experiments involving different topic modeling techniques and ontology contexts (enriched vs. raw). It is interesting to see that the authors highlight that none of the methods used to evaluate approach requires external input (see section 5.1. 'advantage of not requiring an external evaluation corpus' and section 5.2. 'our goal is to evaluate the performance [...] without any intervention'). One can guess here that the authors wanted to get away without the need to involve any experts or users in the evaluation. This is a critical problem, which was already metnioned before wrt. the 'illustrative example'. The article talks a lot about the problem of _people_ not being able to find the appropriate ontologies but then excludes them from the consideration of the 'quality' of the derived topics. In terms of the qualitative results presented in the evaluation the authors refer to a lower I(G) as the better result. But then Table 2 would show that LDA performs better than LDAs consistently. This is raises the question whether the performance benefit of BTMs stems from the benefit of BTM in general and not the fact that an enriched context is used.

Altogether, this article clearly shows that the authors work on a promising topic and contribute a novel method to summarise ontologies. The presentation is clear and the authors provide all material to allow for a reproduction of the results. The issue is that the work seems to be in an early stage and does not contribute significantly to the field of 'Linked Data and Ontology Reuse' as it stands. The embedding with the literature in this area needs to be strengthened and the discussion should come back to this aspect as well. The authors themselves mention that further experiments are needed to provide evidence for what impacts LDAs to perform better starting from K=20. The question is whether the enriched context is the key here. Additional uncertainty to the authors' conclusions is added by Table 2, which seems to conflict with the topic coherence results.

A final but major point to mention is also that bot,h the example case as well as the evaluation, do not regard any users. To make the paper an adequate fit to the special issue, it seems necessary to situate the proposed method within the data/ontology lifecycle and assess (or refer to) the current issues people really have and the benefits they gain from the proposed approach. This automatically brings in alternative methods to overcome those issues and asks for a comparison of the topic modeling approach to those other methods.

I would propose a 'reject and resubmit' decision, since the necessary revisions seem to exceed what one would normally regard as 'major revision'. The user dimension is an additional piece of research to be done and the same holds for the experiments needed to shed more detailed light on the comparison between LDA and LDAs.

[1] Simperl, E. (2009). Reusing ontologies on the Semantic Web: A feasibility study. Data & Knowledge Engineering, 68(10), 905-925.
[2] Simperl, E., Sarasua, C., Ungrangsi, R., & Bürger, T. (2011). Ontology metadata for ontology reuse. International Journal of Metadata, Semantics and Ontologies, 6(2), 126-145.
[3] Käfer, T., Abdelrahman, A., Umbrich, J., O’Byrne, P., & Hogan, A. (2013). Observing linked data dynamics. In The Semantic Web: Semantics and Big Data (pp. 213-227). Springer Berlin Heidelberg.
[4] Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., & Decker, S. (2012). An empirical survey of linked data conformance. Web Semantics: Science, Services and Agents on the World Wide Web, 14, 14-44.
[5] Rula, A., Palmonari, M., Harth, A., Stadtmüller, S., & Maurino, A. (2012). On the diversity and availability of temporal information in linked open data. In The Semantic Web–ISWC 2012 (pp. 492-507). Springer Berlin Heidelberg.
[6] d’Aquin, M., & Noy, N. F. (2012). Where to publish and find ontologies? A survey of ontology libraries. Web Semantics: Science, Services and Agents on the World Wide Web, 11, 96-111.
[7] Luczak-Rösch, M., Simperl, E., Stadtmüller, S., & Käfer, T. (2014). The Role of Ontology Engineering in Linked Data Publishing and Management: An Empirical Study. International Journal on Semantic Web and Information Systems (IJSWIS), 10(3), 74-91.
[8] Noy, N. F., & Musen, M. A. (2000, August). Algorithm and tool for automated ontology merging and alignment. In Proceedings of the 17th National Conference on Artificial Intelligence (AAAI-00). Available as SMI technical report SMI-2000-0831.
[9] Noy, N. F., & Musen, M. A. (2003). The PROMPT suite: interactive tools for ontology merging and mapping. International Journal of Human-Computer Studies, 59(6), 983-1024.
[10] Li, N., & Motta, E. (2010). Evaluations of user-driven ontology summarization. In Knowledge Engineering and Management by the Masses (pp. 544-553). Springer Berlin Heidelberg.
[11] Simperl, E. P. B., & Tempich, C. (2006). Ontology engineering: a reality check. In On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE (pp. 836-854). Springer Berlin Heidelberg.
Chicago