Aemoo: Linked Data exploration based on Knowledge Patterns

Tracking #: 1131-2343

Authors: 
Andrea Giovanni Nuzzolese
Valentina Presutti
Aldo Gangemi
Silvio Peroni
Paolo Ciancarini

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
Abstract: 
This paper presents a novel approach to Linked Data exploration that uses Encyclopedic Knowledge Patterns (EKPs) as relevance criteria for selecting, organising, and visualising knowledge. EKP are discovered by mining the linking structure of Wikipedia and evaluated by means of a user-based study, which shows that they are cognitively sound as models for building entity summarisations. We implemented a tool named Aemoo that supports EKP-driven knowledge exploration and integrates data coming from heterogeneous resources, namely static and dynamic knowledge as well as text and Linked Data. Aemoo is evaluated by means of controlled, task-driven user experiments in order to assess its usability, and ability to provide relevant and serendipitous information as compared to two existing tools: Google and RelFinder.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jan Polowinski submitted on 01/Aug/2015
Suggestion:
Minor Revision
Review Comment:

The refocussing from the EKP explanation to the visualization aspects, as requested by other reviewers, was done from my perspective, e.g., the possible interactions are explained in more detail. This is the most important change with respect to the previous version. Since previous work on the same topic by the authors, has been cited and actually misses details, I think it's fine to summarize the work this journal paper. By reducing the EKP foundations part, the ratio of new work improved.

Other issues, including the minor issues, I noted during the first review have also been addressed, except for the first one: ` Intro > Contribution: method for extracting EKPs „from Wikipedia“ vs „mining the structure of Linked data“ -> be more consistent here? I understood you used DBpedia *AND* Wikipedia, but maybe rephrase this.“ ’ -> Did I really get something wrong here?

# Multiple new minor issues occured and need to be fixed:

- Fig. vs. Figure - be consistent
- p. 10: hover instead of over
- p.10: on the left/right side
- p.11: correct quotes after, e.g, "links to"
- linebreak before skos:relatedMatch
- p11: "for example" lowercase
- Missing comma: "when the focus changes, ..."
- p. 11, bottom of second column: "the presentation of the core knowledge"
- p. 11: contribute to address -> only address / 7 cover?
- "e.g." always with commas
p.12
- "graphical visualisation" -> only "visualisation" ; similar: "graphical diagram" only "diagram"
- mixed AE/BE
- Is the path popularity really 0.18 (p. 4) and then 18.18 (p. 13)?
- check space around footnotes
- check line breaks for texttt in the end

Review #2
By Aba-Sah Dadzie submitted on 05/Aug/2015
Suggestion:
Minor Revision
Review Comment:

The paper is much easier to follow and most of my questions have been answered. There are however a few new ones that the response raised. I include some additional points that need to be addressed in the current version.

wrt the linksTo label, the new information in the text is good, but I'd suggest showing this label only once, from the central node. Repeating the identical label for each edge contributes to clutter without providing new information, esp. as it is stating the obvious.

Having read the explanation about the relevance of culture and country, I am still not convinced that this would have had any influence on the evaluation results. At best, people who come from and/or live in the region or country where a specialised topic originates MIGHT have more knowledge about the topic, e.g., Germans for Kant (or Germans identifying bordering countries more quickly), but NOT, say, the English (as in natives of England) for Shakespeare. I acknowledge this point is raised in the conclusions, but probably reinforces my point. Especially as none of the countries listed has English (which I'm assuming was used for the task - correct me if I'm wrong) as a first or even (formal) working language - for undergrads in a third country using a third language, native language is probably even less relevant.
Maybe pedantic, but not all the nations listed would normally be described as "western" - is this the general meaning, geo-political or cultural? Without labeling any as one or the other, this is a scientific paper, so for the purposes of classification an unambiguous term/definition that relates directly to the evaluation should be used. Or its use here should be defined.
Of course, it is ultimately up to the authors to decide whether or not to stress this aspect of the participants.

There is now more information provided about participant demographics. I'd suggest ALL demographic information is given at the same time, that they were CS students isn't provided till the results analysis.
Importantly, what this does is to now raise a new question - ALL undergraduates in CS is probably the key factor for the tasks here. BUT that automatically means they are NOT lay users. Not being familiar with RelFinder does not make you a lay user, just means you've not had occasion or reason to use it - I doubt EVERY SW expert has. I would say it's correct to say they're not SW experts, maybe this is the distinction that needs to be made - that would influence their ability to formulate complex SPARQL queries, for instance, unless they have experience already in, say SQL, in which case it would be just learning a new query language. Or their familiarity with LD principles. But again with the caveat CS students still have a huge edge over non-tech experts/lay users.
Of course, this raises questions about usability by the intended target. It IS possible that this is perfectly fine, and that the term "lay user" is what needs to be unambiguously defined.

Would be useful to provide an example for the first task - I couldn't figure out till much later how they were presented with the information or what tool(s) they used to solve it. My guess is this would "s p o" as a statement? And did not require Aemoo or the other two tools to be solved?

"The radial visualisation used by Aemoo for presenting data has been widely adopted so far, e.g. [22,2,23], and it is in general a well known visualisation metaphor in literature [15]."
Well known, yes, widely adopted, not so sure. Radial graphs can get busy quite quickly, and are not always easy to read. (Not an issue here, as the graph goes to only one level below the root and the relationships are unweighted and with identical labels. - I'd probably include this in the argument as to why they were a suitable choice.)

Wrt to peculiar/curious links - would be useful to provide snapshots for the default views in Figs 2 & 3, so the reader can compare directly with these. I daresay it would provide a very simple way of answering my question about what would be considered so. And how, if it does, this would vary between subject or topic type.

*************

"In our study we fixed t = 0.18;" - how this value was derived is not explained till much later - it should be given here or at least a forward reference included. Also, later (p.13) t is given as 18.18 - which of these two values is correct, if both, probably needs some explanation.
Additionally, last sentence of section 2 - please include a (high-level) summary, while you obviously don't want to repeat previous work, the paper must stand on its own. This value IS important for interpreting pathPopularity and the decisions made about "peculiar facts".

Please use a consistent method for self-citation - some use the third person, some are direct - reads strangely and mixing the two gives the impression of hiding this. Neither is wrong but this is an open review and all reviewers have noted that this builds on previous work - making it clear where this is being cited would be the preferred option, among others it makes it easier for the reader to understand what is being said.

"If for the same subjects we want to identify information that distinguish them from each other, the retrieved data that complement the core knowledge, will include useful insights. " - don't understand the second half of the sentence. Also, second comma is redundant… or is there something missing before it?

"For example, in Figure 2(b) are shown all the linguistic evidences that explain " - where? I couldn't find this. May simply need to be labelled?

The introduction in section should point to 4.1 rather than "In this section we describe …"

caption of Table 2 - what is meant by "their related figures "? - what figures?

Table 4 is not "SUS-based statements" but rather the SUS reproduced - the caption has been changed from the original. Also, it's referenced, would be fine to do just this, rather than reproducing the original - its contents are not directly discussed in the text. Otherwise the caption should also include the citation.

"In more details, the main “cons” …" why is "cons" in quotes - gives the impression something is meant other than its normal meaning. But this doesn't seem to be the case - just means negative, as in the opposite of 'pro'? Also, "more detail" (no 's')

Still a fair number of grammatical errors - I only include a few (not all) that I put down as I read through.

"Linked Data is feeding up the Semantic Web …" -> delete "up"

"a method for extracting EKPs from the Wikipedia" -> delete "the"

"involves the followings" -> "following" - no "s" (in more than one place)

"EKPs are used for automatically generate SPARQL " -> "generatING"
"This property is generate during …" -> generateD

top of p.7 - formatting gets weird. Also needs a reread and some rewording - a bit difficult to follow.
"provides a linguistic evidence " - delete "a"
"Aemoo uses reflects intuition by further …" either 'uses' or 'reflects' - which is the case? - I suspect 'reflects' as 'uses' doesn't make sense.

"KP Extraction Coordinator which takes care about the coordination" : about -> of

"It has to be noticed that the three tools rely on different " :"noticed" -> NOTED

"The SUS is a well-known metrics" - delete 's' in "metrics"

"RelFiner"

"Wikipeida"

Review #3
By Mariano Rico submitted on 28/Aug/2015
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Once presented Knowledge Patterns (KP, 2010) and Encyclopedic KPs (EKP, 2011), this paper:
* States these 2 hypothesis on page 2:
1) EKPs are cognitively sound and provide a unifying view and a relevance criterion for building entity-centric summaries.
2) EKPs can e exploited effectively for helping humans in exploratory research tasks.
* Claims 5 contributions on page 2:
1) a method for extracting EKPs
2) A method to provide entity summarisation and knowledge aggregation
3) A visual exploration research system (Aemoo) that uses EKPs for enriching and filtering data
4) An evaluation of the cognitive soundness of EKPs
5) An evaluation of Aemoo concerning its usability and capacity to provide relevant and serendipitous information.

However,
- hypothesis 1 is described in Nuzzolese2011 (ref [36] in the manuscript), and the evaluation (section 4.1 of the manuscript) is (exactly) the same that the one described in Nuzzolese2011.
- contribution 1 is described in Nuzzolese2011 (ref [36] in the manuscript)
- contribution 2 is derived from contribution 1. The evaluation is in section 4.2 Task 1.
- contribution 3 is Aemoo. Described (although briefly) in Nuzzolese2013 (ref [37] in the manuscript)
- contribution 4 is hypothesis 1.
- contribution 5 is, under my point of view, the most original part of this work.

Therefore I see many overlaps with previous works by the authors of this manuscript (for me this is a mix of previously published results + Aemoo evaluation). I would remove the non original contributions of this papers in order to focus on the most original parts.

- Fig 2 and Fig 3 are essentially the same that figures 1, 2 and 3 of Nuzzolese2013 (ref [37] in the manuscript)
- Tables 2 and 3 are the same that the ones in Nuzzolese2013.

Concerning the structure and extension of the manuscript, I would reduce its size by removing the non original parts. The summary of KP and EKP in sections 1 and 2 is very good, and this kind of summaries should be provided in the overlapped sections.

1 Intro 1 pag
2 EKPs 2 pag
3 EKPs as relevance criteria for Exploratory research 8 pag
3.1 Knowledge enrichment and filtering 3 pags
Identity resolution
EKP selection
Filtering and enrichment of static data
Filtering and enrichment of dynamic data
Aggregation of peculiar knowledge
3.2 Knowledge visualization 3 pags
3.3 Implementation details
4 Evaluation 5 pag
4.1 Experimental setup for evaluating EKPs
4.2 Experimental setup for evaluating Aemoo
5 Results and discussion 6 pag
5.1 Cognitive soundness of EKPs
5.2 Usability of Aemoo
6 Related work 1.5 pag
7 Conclusions and future work 0.5 pag

Concerning UI evaluation, in section 3.2 there is a list of 5 requirements. In that section you say "Requirements R1 and R3 are fulfilled by applying the method described in section 3.1". I guess that this only can be claimed after an evaluation with users.

Additional comments:
- In Fig.1, I can not distinguish between blue and gray arrows.
- Pag 6, "affected, accordingly". Remove semicolon?
- Pag 7, "Aemoo uses reflects..". Rephrase.
- In Fig 4. The RESTful layer can not be independent. I assume is not with Aemoo so, I propose to add a box with both, this layer and the KPExtractor, within.
- Pag 20, "Finally, Figure 5 shows the ratings (based on a Likert scale) that participants". Is not figure 5 but 11, right?.
- Fig 10. The last three questions are only measured for RelFinder. Why not Aemoo?.


Comments

Special issue on: Visual Exploration and Analysis of Linked Data
Previous version: #958-2169