Combining Serendipity and Active Learning for Personalized Contextual Exploration of Knowledge Graphs

Tracking #: 1641-2853

Federico Bianchi
Matteo Palmonari
Marco Cremaschi
Elisabetta Fersini

Responsible editor: 
Guest Editors IE of Semantic Data 2017

Submission type: 
Full Paper
Knowledge Graphs (KG) represent a large amount of Semantic Associations (SAs), i.e., chains of relations that may reveal interesting and unknown connections between different types of entities. Applications for the contextual exploration of KGs help users explore information extracted from a KG, including SAs, while they are reading an input text. Because of the large number of SAs that can be extracted from a text, a first challenge in these applications is to effectively determine which SAs are most interesting to the users, defining a suitable ranking function over SAs. However, since different users may have different interests, an additional challenge is to personalize this ranking function to match individual users’ preferences. In this paper we introduce a novel active learning to rank model to let a user rate small samples of SAs, which are used to iteratively learn a personalized ranking function. Experiments conducted with two data sets show that the approach is able to improve the quality of the ranking function with a limited number of user interactions.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 17/Jun/2017
Review Comment:

The paper explores techniques for the retrieval and ranking of Semantic Associations from Knowledge Graphs (i.e. loop-free paths connecting two entitities in the KG). This is very topical and important research area. The challenge is the very large number of such paths that can be found between two given entities, hence requiring some effective ranking and also personalization as well of the list of SAs returned.

The paper presents a pay-as-you go approach for achieving these aims, using a learning to rank algorithm and an active sampling method. The authors describe in detail experiments conducted over two datasets, comparing across a variety of baselines and algorithms. The results are promising in respect of both the ranking of SAs, and support for the personalization hypothesis.
The paper is generally clearly motivated and presented, giving the necessary details for the various techniques.

Unfortunately, however, the paper bears a strong similarity to "Actively Learning to Rank Semantic Associations for Personalized Contextual Exploration of Knowledge Graphs" by the same authors, published in the ESWC 2017 conference proceedings (Springer):

- the Abstracts are identical
- the introduction, motivation and stated contributions are similar
- Figures 1 and 3 are the same in both papers
- Figure 2 is very similar between the papers
- the general approach and specific techniques described are the same in the two papers
- the datasets and experiments appear to be the same, although described in a little more detail in the paper submitted to SWJ
- the experimental results and discussion are essentially the same, apart from the addition of the Wilcoxon text.

Thus it is hard to see any additional subtantive contribution by the paper submitted to SWJ compared with the authors' paper already published in the ESWC 2017 proceedings.

More detailed comments about the paper submitted to SWJ:

In the discussion on page 2, a bit more background is needed about the DaCENA application. Also, some forward-looking discussion is needed to explain concepts such as "k-most interesting", "ordered by serendipity", and whether "interest" is regarded as being the same as "serendipity".

On page 6, a short discussion is needed on how the parameter alpha is set. On page 9, some more discussion is needed on how p and lambda were determined.

Section 3 requires some more reflection on why certain choices were made. It is hard for the reader to follow the overall strategy within all the details presented. A short summary overview is needed, perhaps at the start of the section.

On page 11, the discussion at the bottom of Column 2 on the LAFU dataset it unclear and should be rephrased.

Page 1 col 2
receiving ccontent -> receives content
Page 4 col 1
Unfortunately, being Serendipity -> Unfortunately, Serendipity being
weather this -> whether this
Page 6 col 2
in a SAs -> in a SA
P7, c1
entity that are central -> entities that are central
features that considers -> features that consider
P8, c1
Two different dataset -> Two different datasets
collected these dataset -> collected these datasets
in Figure 5, we show -> in Figure 5: we show
P9, c1
for SAs, in addition often -> for SAs. In addition,
for each algorithms -> for each algorithm
algorithms that have are signed with the blue columns ->
algorithms that are marked with blue (dark grey) in the first column
compare it with the use of clustering algorithm ->
compare it with the use of clustering algorithms
thus it could be informative -> thus could be informative
selected one SAs -> selected one SA
uncertainty a Global Uncertainty -> uncertainty, a Global Uncertainty
P10, c1
a order -> an ordering
In the SAMU datasets Dirichlet -> In the SAMU datasets, Dirichlet
this two methods -> these two methods
this dataset it is bigger -> this dataset is bigger
P10, c2
One of our assumption was that the personalization was needed ->
One of our assumptions was that personalization was needed
which output -> whose output
P11, c1
has given rating to -> has given a rating to
better then methods -> better than methods
with each iterations -> with each iteration
in the sectopn above, the general -> in the sectopn above. The general
not able to access to the -> not able to access the
P11, c2
algorithms requires longer -> algorithms require longer
can not -> cannot
to training -> to train
the plots 6 -> the plots in Figure 6
P12, c1
with the algorithms configurations -> with the algorithm configurations
Table 8, we signed with -> Table 8 - we mark with
P13, c1
serendipity heuristics is a -> serendipity heuristic is a
user are interested -> users are interested
P13, c2
that help a user -> that helps a user
to by known -> to be known
P14, c1
This approach use -> This approach uses
tailored on -> tailored to
split in a training -> split into a training
uses uncertainty measure -> uses an uncertainty measure
The sentence "(In Section 3 .....Section)." is unclear and should be rephrased.
The sentence starting "One approach that has been proposed ..." is unclear and should be rephrased.
P14, c2
requested to the user -> requested from the user [ twice ]
since user are interested -> since users are interested
An other -> Another

Review #2
By Kouji Kozaki submitted on 08/Aug/2017
Review Comment:

This paper propose a method to extract semantic associations (SAs) from knowledge graphs based on combining heuristic approach called serendipity and active learning.

However, the difference between this paper and the ESWC2017 paper submitted by the same authors looks like not so large.
I checked the ESWC2017 paper, and as the result, I think that the organizations and sentences in the submitted paper are revised with more detailed explanations while there are not new contents such as additional experiments or new discussions.
So, I think that this paper should be rejected if we consider some new experiments or other contents are necessary for extended version of conference paper in the policy of this journal.

The following are comments that the author should consider when they extend this paper.

The authors setup some experiments and evaluated the proposed method.
I think that their experiments are well designed. And their results shows some usefulness of the proposed methods. However, there are some rooms to consider as follows.

1. I agree that combination of heuristic approach and active learning for extraction of SAs is an important topic and the proposed method is probably the first system in this area.
However, I cannot understand enough technical novelty of the proposed method because it is just an application of active learning to a recommendation problem. Is there any feature of the proposed method in comparison to other applications of active learning method?
In other words, what difficulties did the authors overcome when they applied active learning techniques to extraction of SAs?
Is there any new knowledge for combination of semantics and learning techniques?
2. In Section 2.2, features vectors for SAs are introduced. The authors should discuss which features are effective for the result of the experiments discussed in Section 3.
3. I suppose that effectiveness of recommendation is dependent to kinds of contents. In the experiments, the author used 5 articles for SAMU and 2 articles for LAFU. They looks like almost same kinds of topics. Is there any consideration of findings about dependency to kinds of contents?
4. The experiments shows importance of personalization through IRR. Then, how did you consider effectiveness and/or usefulness of the proposed method for personalized recommendation of SAs? Could the experiment show some results about that? Otherwise, is there any plan to evaluate personalization by the proposed method?

Review #3
By Marta Sabou submitted on 22/Aug/2017
Review Comment:

This paper focuses on the topic of contextual knowledge graph (KG) exploration, that is on settings where users read a textual source and are supported in their exploration through relevant Semantic Associations (SAs) about the text extracted from a KG. The authors previous DaCENA system exemplifies such settings. Important aspects in supporting such contextual exploration settings are (1) determining relevant SA’s to recommend and (2) providing a personalized ranking of SAs based on the interests of each individual user. To achieve such a personalized recommendation of SAs, the paper proposes an SA recommendation approach called Active Learning to Rank (ALR) which is bootstrapped with SAs based on serendipity metrics and then refines itself through active learning based on user ratings of the proposed SAs. A number of evaluations are conducted.

Unfortunately, the content of this paper heavily overlaps with that of paper [10] published earlier this year at ESWC. At a closer analysis, large parts of the paper are literally reproduced (including the abstract). It also appears that this submission reports on experiments on the same datasets and using the same evaluation setups as the conference paper did. Therefore, the presented results are also the same. The only additional analysis is the Wilcoxon test but this addition is too minor to grant republishing the material as a journal paper. Therefore, this paper cannot be accepted in its current form given its marginal novelty of with respect to the earlier conference paper.