A Hierarchical Framework for Efficient Multilevel Visual Exploration and Analysis

Tracking #: 1227-2439

Nikos Bikakis
George Papastefanatos
Melina Skourla
Timos Sellis

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
The purpose of data visualization is to offer intuitive ways for information perception and manipulation, especially for non-expert users. Most traditional visualization tools and methods operate on an offline way, limited on accessing static (preprocessed) sets of data. They also restrict themselves on dealing with small dataset sizes, which can be easily visually analysed with conventional visualization techniques. However, the Web of Data has realized the availability of a great amount and variety of big datasets that are dynamic in nature; most of them offer query or API endpoints for online access and analysis. Modern visualization techniques must address the challenge of on-the-fly visualizations over large dynamic sets of data, offering efficient exploration techniques, as well as mechanisms for information abstraction and summarization. Moreover, they must take into account different user-defined exploration scenarios and user’s preferences. In this work, we present a generic model for personalized multilevel exploration and analysis over large dynamic sets of numeric and temporal data. Our model is built on top of a lightweight tree-based structure which can be efficiently constructed on-the-fly for a given set of data. This tree structure aggregates input objects into a hierarchical multiscale model. We define two versions of this structure, which adopts different data organization approaches, well-suited to exploration and analysis context. In the proposed structure, statistical computations can be efficiently performed on-the-fly. Considering different exploration scenarios over large datasets, the proposed model enables efficient multilevel exploration, offering incremental construction via user interaction, and dynamic adaptation of the hierarchies based on user’s preferences. A thorough theoretical analysis is presented, illustrating the efficiency of the proposed model. The proposed model is realized in a Web-based prototype tool, called rdf:SynopsViz that offers multilevel visual exploration and analysis over Linked Data datasets. Finally, we provide a performance evaluation and a empirical user study employing real datasets.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Heiko Paulheim submitted on 20/Nov/2015
Review Comment:

The authors have made a considerable effort to revise their paper and address the reviewers' comments. Especially reorganizing the paper and moving details to the appendix clearly improves the clarity and readability.

The only remark I am left with, also w.r.t. the comments of reviewer 3: the authors state (p.20) that their approach exhibit a linear time performance, whereas before, it is stated that both require a sorting of the input objects, which is O(NlogN). This means that it can actually not be linear.

With that remark fixed, I would recommend acceptance of the paper.

Review #2
By Aba-Sah Dadzie submitted on 13/Dec/2015
Minor Revision
Review Comment:

My last set of questions have mostly been answered but I still struggled a bit to read the paper. Even with the algorithms moved to the end it is still quite long. But it feels longer because it is to some extent tedious to read and there are still a few unclear and inconsistent or contradictory parts. While the authors make a fair argument for the contribution of the work, some of it provided only in the response, it is still necessary that the reader is able to take away the message intended.

*** response to last review

R3.3., 3.11 - these points should be explained clearly in the text, as in the response - it is not easy for the reader to guess either

R3.3 - if 2 is never chosen why include the column in the table at all?

R3.6 - 42% is nearly half. Even with only one other variable being dominated is simply incorrect - it is at best a bit smaller. More than one variable, if anything at all it rather dominates.

R3.8 - can’t say I’m convinced by the argument about linearity.

*** other points ***

A large number of grammatical errors and typos. The paper is tedious to read mainly because there are too many redundant commas - I had to go back several times to reread a sentence and manually remove commas to make sense of it. Commas should be used to join two distinct parts of sentences, or only where there is a natural pause, or to encapsulate further detail. My last sentence would read ok without commas but I deliberately included them to give an example of natural pauses and distinct parts.

Footnotes should be placed as close as possible to whatever they’re annotating. And should preferably not break up reading.

Some equations run into the text in the adjacent column.

What is the square at the end of the first para - p.11?


Figures where colour is used to distinguish areas, e.g., Fig. 6 & 7 - these colours are not distinguishable on a monochrome printout - which, incidentally I read off. This is not unusual - I cannot even access a colour printer at work without jumping through hoops!
I simply could not find whatever it was being referred to in the text. Colour is fine only if it is easy to distinguish off screen. Also, red-blue (and red-green) contrast is a specific issue even on-screen or printed in colour.

p.7 - “the resulting tree avoids overloaded and scattered visualizations.” - what exactly are “scattered visualizations”?

“our approach consider perfect m-ary trees, such that a more "uniform" structure (i.e., all the groups are divided into same number of groups) is resulted” - I don’t understand this - does this means same count in each group? or something else?
also “is resulted” -> “results”

footnote 8 - needs an example - what scenario, for instance?

At the very end of the paper a numerical citation starts a sentence - either reword to start with a word or use Author [citNo].

p.10 - “In the RAN scenario (lower flow in Figure 5), the user specifies [20, 50] as her range of interest. ” - the figure has [30, 50], not 20…

p20 - “the HETree approaches outperform the FLAT by about one order of magnitude. ” - means 10x - that’s not what the values say.

“the High- chart requires approximately 90 msec for rendering the charts in the browser.” - and this is significant because…

Fig 11 a vs b - why the limit to 20K - not saying there isn’t a good reason but this is not my first read and I couldn’t find why. Also what happens in the gap betwen 20 and 50K?

p.21 - “29.6K nodes are to be initially constructed (along with their statistics), while the incremental approach constructions in the worst case, 15 nodes.” - HUGE difference between 29.6K and 15. Need to step the user through how you came down to 15 - maybe use this example while explaining the algorithm in question. Would also improve readability.

The interpretation of numbers in Tasks T1 and T2 are at best contradictory. R - 28 to C - 29 and R - 47 to C 52 cannot be described as “outperforms” unless qualified as “by a very small amount”. Esp as FLAT - 63 to C - 57 and R - 62 is described as “very close”.

S6.1 - LDVM is not a tool but a model. If referring to an implementation based on it you should state that.

Review #3
By Tomi Kauppinen submitted on 18/Dec/2015
Review Comment:

Reviewed along the usual dimensions for research contributions:

(1) originality

This paper presents an approach - among many other ones - for visual exploration and analysis. There is no new techniques presented but this was not the focus as authors argue in 6.2.1. As such, there is still originality as authors point out in the related work section and in its discussions. For me the discussion argues well for the novelty, especially since there is also an evaluation presented.

(2) significance of the results

Even if the tool is a "an early prototype" as they put it, authors still present a proper evaluation of it and especially discuss the results openly. Results are still early ones but they provide a good basis for the community to build new research settings to understand and improve hierarchical structures for visualizations.

(3) quality of writing

This paper is very well written and contains no spelling issues to mention. The only concern is that authors have copy&pasted many of the sentences from their other articles (for instance the start of the introduction from their ESWC demo paper). While this may not be a big issue as such I would still suggest authors to consider to rewrite those sentences to better serve the contexts of this article.

As a summary I support publishing of this paper as it is in the SWJ special issue.