Combining Serendipity and Active Learning for Personalized Exploration of Knowledge Graphs

Tracking #: 2176-3389

Federico Bianchi
Matteo Palmonari
Marco Cremaschi1
Elisabetta Fersini

Responsible editor: 
Jens Lehmann

Submission type: 
Full Paper
Knowledge Graphs (KG) are now a widely used knowledge representation method and contain a large number of Semantic Associations (SAs), i.e., chains of relations that may reveal interesting and unknown connections between entities. Information that comes from a KG can be used to help a user that is doing a familiar task like reading an online news article, by adding contextual information that can provide informative background or serendipitous new details. Because of the large number of SAs that can be extracted from the entities that are found in an article, it is difficult to provide to the user the information that she needs. Moreover, different users might want to explore different SAs and thus exploration should be personalized. In this paper, we propose a method based on the combination between a heuristic measure, namely serendipity, and an active learning to rank algorithm that is used to learn a personalized ranking function for each user; this method asks the user to iteratively score small samples of SAs to learn the ranking function while reducing the effort on the user side. We conducted user studies in which users rate SAs while reading an online news article and used this data to run an experimental evaluation. We provide evidence that users are interested in different kinds of SAs, proving that personalization in this context is needed. Moreover, results not only show that our methodology provides an effective way to learn a personalized ranking function but also that this contextual exploration setting can help users learn new things.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Besnik Fetahu submitted on 22/Jun/2019
Review Comment:

In this manuscript, the authors propose a framework for ranking of semantic associations (i.e. facts about entities) of entities appearing in a textual snippet. The premise is that, due to the different interests of users, ranking functions need to take into account the user interests in providing SA ranks that match those interests.

The approach makes use of heuristics to capture user interests, and additionally train a separate learning to rank (L2R) model for each user. The instances used for training are taken in an active learning scenario, with early stopping criterion, such that the amount of labeled instances is minimal.

Contextual information when reading news articles or other textual snippets per se is an important topic. Finding the right facts or triples that may help the user to have a better understanding for the topic at hand is crucial. Often the challenges are related to the coverage of facts from the existing KBs; identifying facts that take into account the entities, which co-occur in a textual snippet etc.

To this end, I think the problem in itself is interesting and it is worth pursuing techniques that aide the user in understanding content in which they are no proficient.

There are several issues with this manuscript, which I am listing and explaining them in detail.

1) First of all, the novelty of this manuscript is very limited. Even though there are changes to the previously published article (DaCENA), however, the previously published article is a demo and the scientific contribution is very limited. The heuristics that are supposed to capture user interests have no user features nor features that can maybe exploit the user similarities (as it is done in recommender systems) for training L2R models that are sensitive to users' interests.

2) Training L2R models for individual users is highly ineffective and additionally you will use a lot of information that can be leveraged from the other users that may share similar interests. I would strongly advice to have a look at the literature in the recommender systems. For instance, [1,2] (to name a few) propose personalized recommender systems in the use case of online news. Now, here the task is recommendation of online news, however, the task can be easily tweaked to recommend SAs.

3) The proposed SAs for a user are done through the DaCENA approach, which ranks the SAs based on a serendipity measure. How can this be a useful measure for a user who has no idea about a topic in which she is reading a news article? Furthermore, it is hard for me to understand how can you use TF-IDF measure to measure the relevance between a fact or triple with a news article? This is probably the wrong measure as the IDF will be the same for each term, thus, you are comparing only tf word vectors. In that case, the similarities will be eventually decided by the distribution of stop words or frequent words which may not have to do anything with the topic.

4) How can you justify that a user wants to read 50 to 100 SAs for an article? This seems like an arbitrary number and I find it hard to believe that any user would be interested in reading an additional 100 facts for a news article. What happens in the case where the knowledge base has no sufficient coverage of facts for an article? What is the fallback mechanism in such scenarios?

5) Another claim which I find unjustified is that the SAs are provided for some subject entity, which is supposed to be the salient entity in a news article. Based on your previously published approach, you state that this is simply the most frequent entity. Previous research has shown that predicting the salient entity for a news article is not trivial [3,4], especially, in the case of the SAMU dataset where you have only one paragraph, and most likely each entity will appear at most once.

6) Another unjustified evaluation setting is that in the SAMU dataset and LAFU dataset, you have different Likert-scales. Whats the rationale for having 6 in the first, and 3 in the second. These decisions are not justified and explained in the manuscript.

7) The last issue that I have is with regard to the expectation that for each news article or text snippet there is an expectation that the users will have completely different expectation or no common ranking of SAs. How can you justify this? It is clear that for a news article there can be only a predefined set of information facets. It is highly unlikely (if not impossible) that you will have endless facets. This leads to my point that for any news article for a given topic, the SAs will be probably highly centered towards the salient entity. Thus, a very low, nearly zero, inter-rater agreement is not convincing for me at all. This means that the users did some perfectly randomized rankings of the SAs. Otherwise, such a score does not make sense.

8) Connected to the previous point, it would be interesting to see what is the actual overlap in terms of SAs (without ranking) for two different users and the same news article.

9) It is also not clear how you have recruited the users? Furthermore, one conjecture that comes from the nearly random rankings between the different users may be explained by the amount of time the users take in completing the task (up to 12 mins).

Finally, the manuscript has several language issues (syntax errors and typos, e.g. "texts small" Pg. 9), which needs careful proofreading. In one case there is an entire sentence missing, which makes the understanding of the LAFU setting quite difficult.

[1] Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. 2014. Personalized entity recommendation: a heterogeneous information network approach. In Proceedings of the 7th ACM international conference on Web search and data mining (WSDM '14). ACM, New York, NY, USA, 283-292. DOI=
[2] F. Garcin, K. Zhou, B. Faltings and V. Schickel, "Personalized News Recommendation Based on Collaborative Filtering," 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Macau, 2012, pp. 437-441.
doi: 10.1109/WI-IAT.2012.95
[3] Besnik Fetahu, Katja Markert, Avishek Anand: Automated News Suggestions for Populating Wikipedia Entity Pages. CIKM 2015: 323-332
[4] Jesse Dunietz, Daniel Gillick: A New Entity Salience Task with Millions of Training Examples. EACL 2014: 205-209

Review #2
Anonymous submitted on 06/Jul/2019
Major Revision
Review Comment:

This paper describes an approach that supports users in exploring the topic of interest through personalized semantic annotations (SAs). It is extended from a paper that was presented at the ESWC 2017 conference with more details about the approach and more data analysis. I think the topic of the paper is interesting for SWJ audience, the approach is sound in general, the evaluation result indicates improvements over SToA methods, but major revision needs to be carried out before publishing.


Authors proposed an approach that iterates through 5 steps, in which the authors try to minimize the user effort while satisfying their information needs. The cold start problem for the initial ranking was also addressed by acquiring user feedback. To my knowledge this approach is novel and sound. However, there are some things I think could be improved:
- explain the reason/motivation for choosing each method. For instance, why use active learning for reducing user effort? There should be more description of the active learning.
- Why choose rankSVM over other ranking algorithms? Has it been compared with other algorithms?
- step 2 User Ratings. In the paper, when talking about the user annotation/evaluation of the SAs, different descriptions such as “usefulness”, “relevance” and “higher interest” have been used, in my opinion, these are completely different evaluation criteria. It is not clear, what instruction was given to the user for ranking the SAs. If no clear instruction was given, the users might have different understanding on the task and approach the labeling in different ways.
- For serendipity score computation: what is the motivation for selecting SAs with high rarity?
- Why use alpha = 0.5?
- describe “active learning/sampling” in a more formalized way
- It is not clear how features are computed based on the description in 2.5, please add more details in a way that it is possible for the readers to reproduce the same features.


Authors created 2 datasets with different characteristics to evaluate the approach. The performance of the proposed approach was compared to several baselines. Feature effectiveness has also been evaluated. The process is reasonable in general and the evaluation results support the authors’ claims, but some further clarification is required:
- in the paper it states “The questionnaire asked the user to give a new rank about how much the topic and the elements of the article were clear after they have been able to read each SAs”, seems it doesn’t comply completely with any of the terms appeared in previous text (relevant, interesting for user, usefulness), please make the ranking goal clear and consistent in the paper.
- The aforementioned task instruction might also reduce the personalization effect of the labels in the dataset as the resource that allows users to ‘clarify’ a topic is less personal compared to their interest in exploration.
- Why different rating scales (1-6 v.s. 1-3) are used for the different dataset?
- For the LAFU dataset, the key information is missing, for instance, how many documents each user has read, what was the length of the article, how many SAs were computed for each article, why is 7 days needed for the task, etc.
- Section 3.3.1, explain what is λ and p
- The evaluation result is not sufficiently discussed, especially for the result in Section 3.4.2. Table and figures are given, but there is no discussion about them.
- p16: “Our hypothesis is that a user can find interesting information that allows them to better understand the context of the article.” - again, I don’t think “interesting” and support “better understanding” is the same thing, please clarify the goal of this work.
- Insufficient discussion of the evaluation result, e.g. discuss the performance difference on two datasets and the reason behind.
- Are the two datasets (SAMU and FAMU) mixed for the feature evaluation? If so, I think analysis on both datasets is necessary in order to support this evaluation approach (i.e. that they can be mixed). If only one of the datasets is used, a clearer description of the approach and the motivation behind is required.


The writing quality is in general not good. There are a lot of issues beyond typo or small grammatical errors. For instance, a big fraction of certain part of the paper, e.g. Section 1.1, 2.1, 2.3, 3.1(especially for the LAFU dataset part) and 3.3, is hard to follow. There is broken sentence in the beginning of page 10: a paragraph ends with “To this end, two users”. In the same section, a user has been referred to as both “he” and “she” in the same sentence, this issue appears throughout the paper.


Some more detailed comments, I only listed some of the language issues, there are many more in the paper, please proofread and revise the whole paper.

- abstract: “combination between” -> combination of, “different kind of SAs” - different SAs?
- p2: “a European user that is..” -> who is
- p6 “an active sampling algorithm IS proposed ...”
- p6 “which motivates a reason”?
- p8 in the Global PageRank paragraph, broken sentence: “in this, way...”
- both “rankSVM” and “rank SVM” appear in the text
- dataset v.s. data set
- p10, compile error: xi ¿ xj or xj ¿ xi
- p10: “The formula to compute Kendall’s τ appears below” - appear is not a proper term
- p13: “not has good as” -as good as
- in conclusion and throughout the paper, the authors state that “personalization of KG exploration is necessary since users are interested in different kinds of contextual information” - again, this depends on how the study task was created. This is valid if the users were supposed to explore the information they are interested in. But if the users were supposed to select the result that can help them understand the topic, a more suitable assumption might be that the users have different prior knowledge on relevant topics.

Review #3
Anonymous submitted on 15/Jul/2019
Major Revision
Review Comment:

This article discusses a framework which facilitates the user in reading an article by providing background details. A contextualized exploration is provided to the user. For the same article different user may require different kinds of details leading to personalized exploration of Semantic Associations.

In the beginning of Section 1.1, the authors are claiming that the reader may not have background knowledge. But is it very difficult to show a snippet of Wikipedia article or DBpedia abstract about the entity in the application?

The paper targets interesting problem. However, there are many organization issues that I would like the authors to target. For example, when the authors give 3 points "(i) first of all we need to define ..." , I would expect a step by step explanation to each of the points. Similar kind of repetitive information is given when the steps of Figure 3 are being explained in Page 6. Why don't authors directly start with the explanation of Figure 3, starting from, " The model proposed...".

The text between section 2 and section 2.1 does not give any additional details.

Section 2.5 gives the details of features used for RankSVM which is discussed in section 2.3. I do not understand why is it given as a separate section at the end. It could given before 2.3 to make a smooth transition.

It would be interesting to know how the authors are proposing to extract Semantic Associations. Are these associations enough for the user to understand what is happening in the article s/he is reading.

Another interesting point to mention would be about what is the intended audience for such kind of exploration. Have the authors made some experimentation based on user point of view. By this I mean, if the users testing the application common users or experts (i.e., people who know about Knowledge Graphs). It would be interesting to see if the presented information was useful for common users and if they were able to benefit from this. Was the interaction of such a user with the system an easy task.

Is the data set prepared by the authors available somewhere? Statistics on the data-set would be interesting, i.e., how long is the short article and the long article. How many articles have been considered. A link to the Github repository would be very useful here.

I would highly recommend the authors to run a grammar check and rephrase in many cases. It will be better to ask a native English speaker to proof read the paper.

Authors may also want to look at the paper:
Petar Ristoski. Towards Linked Open Data Enabled Data Mining - Strategies for Feature Generation, Propositionalization, Selection, and Consolidation. ESWC 2015