A Five-Star Rating Scheme to Assess Application Seamlessness

Tracking #: 1070-2281

Authors: 
Nick Del Rio

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
Abstract: 
Analytics is a widespread phenomenon that often requires analysts to coordinate operations across a variety of incompatible tools. When incompatibilities occur, analysts are forced to configure tools and transform or munge data, distracting them from their ultimate task objective. This additional burden is a barrier to our vision of seamless analytics, i.e. the use and transition of content across tools without incurring significant costs. Our premise is that standardized semantic web technologies (e.g., RDF and OWL) can enable analysts to more easily munge data to satisfy tools' input requirements and better inform subsequent analytical steps. However, although the semantic web has shown some promise for interconnecting disparate data, more needs to be done to interlink user- and task-centric, analytic applications. We present five contributions towards this goal. First, we introduce an extension of the W3C PROV Ontology to model analytic applications regardless of the type of data, tool, or objective involved. Next, we exercise the ontology to model a series of applications performed in a hypothetical but realistic and fully-implemented scenario. We then introduce a measure of seamlessness for any ecosystem described in our Application Ontology. Next, we extend the ontology to distinguish five types of applications based on the structure of data involved and the behavior of the tools used. By combining our 5-star application rating scheme and our seamlessness measure, we propose a simple Five-Star Theory of Seamless Analytics that embodies tenets of the semantic web in a form which emits falsifiable predictions and which can be revised to better reflect and thus reduce the costs embedded within analytical environments.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Roberto García submitted on 11/May/2015
Suggestion:
Accept
Review Comment:

Thought the revision does not fully address the issue about connecting the model to real-world users and user experience, future plans to address this issue have been added to future work section.

Overall, the paper constitutes already a very interesting contribution so I would suggest its publication. And also encourage authors to address the user experience issue and experiments with real tools and users in a future paper following this one.

Review #2
By Aba-Sah Dadzie submitted on 22/May/2015
Suggestion:
Minor Revision
Review Comment:

The authors do a fair job of responding to the review. However, while the paper is much easier to follow, there are still a few bits I found need some work.

Wrt the response to reviews, some to the formulae are a bit strange - are the corrections because there was an initial error in writing them down, or were they modified simply to conform to the questions/concerns raised? The latter comes with the worry that the theory has not been properly thought out and/or revised for validity or correctness. I can understand modifying the text to clarify presentation, but this is not always the case here.

S6 ends "We believe our work embodies the community’s assumptions, claims, and hypothesis as a simple theory that can be used to assess, predict, and refute the tenets of Linked Data that have been advertised for nearly a decade." Was "refute" really meant?
R: We did. Linked Data is not yet rigorously proven to work and so it is possible to refute it.
A sufficiently strong statement that it probably needs a whole paper to address just that. This is a scientific paper - either provide stronger, irrefutable (no pun intended) evidence or it really needs to be toned down. Note, I’m not saying LD is perfect, but is any other structure? Each has its merits and disadvantages, and each needs to be used within its limitations. I would argue probably along these lines, rather than simply stating it must/can be refuted. In fact, this is what is done in the abstract. And again in the paper itself, e.g., in S5.3 & 5.5.
As an example, Hellmann et al., in ISWC 2013 (http://dx.doi.org/10.1007/978-3-642-41338-4_7), though not about visualisation, tackle a similar challenge, and come to similar conclusions wrt the utility or usefulness of LD for managing it. The concluding argument doesn’t need to be the same, but illustrates my point.

“it remains difficult to easily reuse those tools in evolving environments such as the world of LD analytics – perhaps because they rely on non-semantic representations that make it difficult to establish and maintain connections across analyses.”
what is “they” referring to - the tools or LD - I would guess the former, but this is not clear - would suggest replacing “they” with exactly what.

“[pp6] : developing or modifying t’s code base” - where ’t’ is… later on I see this defined as “tool”, needs to be done here.

who is Mary? - end S3.1.1

S3.2 - the intro is unrealistic - the previous section has already illustrated in detail the challenges Amy faced and the fact that she had to restart her analysis almost from scratch because her initial results were not reusable. Why then would she give to someone else only the limited set of (interim) results - the png file, that she herself had found near useless for her own subsequent analysis?
Even ignoring this, anyone with the most basic analytical experience would immediately have asked for the source and any other interim results, rather than waste their time with an image they knew would at best give them an overview they couldn’t do very much else with.
The same scenario could have been played out with Amy giving Bart all her files, but highlighting, maybe, the overview he could start with - the png file, before continuing with the dataset with the additional (meta)data that was really useful for further, detailed analysis. And then (re)emphasising which bits - original data and/or interim results, were useful for what.
In fact, this sentence highlights exactly this: “ In practice, reviewing ALL [emphasis mine] of Amy’s materials WITHOUT ANY CONTEXT would be a daunting task; context and meaning of the results were lost.”

“Bart sends the cluster map image to Amy, with some small text describing his findings.” - a bit vague. First question that comes to mind - why was this text not encoded as metadata in the results? There are two obvious answers:
- 1 - the tool didn’t allow it - which highlights the issue being tackled in the paper to start with
- 2 - he didn’t simply because it was more convenient for him to send this information across in plain text
Ignoring the fact I shouldn’t have to guess - the answer could be something else altogether - really need some description of what Bart sent back to Amy AND why he chose to send it in this way. Also, this reinforces again the importance of the challenge being tackled here - it highlights limitations in current tools/approaches for encoding (and reusing) the analyst’s process and results (as semantic also).

in S3.3 “In the actual analysis, every application generated only a mundane dataset. ” - there was some RDF and XML data generated, surely, this should not be mundane - both allow both data and metadata to be included. And if this wasn’t taken advantage of, that’s not because the data structure doesn’t allow it. In fact, S4.2 makes exactly this point wrt RDF.

Fig 8 - how does each analysis start and end with dα1?

“For example, compare Amy’s application α3 and Bart’s application α3, ” - I can see only α1 for Bart

“We assume that linked data, including data that can be trivially munged to yield linked data, are easier for subsequent analysts to reuse. On the other hand, mundane results such as PowerPoint slides, CSV files, and raster images are harder to reuse [9]. For example, Bart reused Amy’s less contextualized RDF file instead of the PNG image result.” - logically, this does not make sense - first the RDF should be MORE, not less contextualised. Secondly, if this IS correct, why would you use a LESS contextualised dataset? Especially when reusing someone else’s results?

S4.1 “We can now define the seamlessness score, S, that incorporates the integration and reuse ease expressions:” - what is meant by “EASE expressions”?

Fig. 9 - only some text is legible (in print and at normal resolution on-screen). Even at high resolution I struggle to say for sure what is bold. I’d suggest using higher contrast - either colour that prints well or a properly contrasting monochrome pattern. Also, in 1… 5 X [star], is X loaded? or is this simply saying one-star, etc.? If the latter the X is both redundant and confusing. If loaded it is not obvious what it stands for.

Table 1 gives a total cost 0.66 to Amy, but previously, in the text, this is given as 0.75 (top, p.11). Or are these two different things - I may simply be confused because Bart has the same value - 0.48 in both cases. And in the first instance, how the values were obtained wasn’t shown directly (I acknowledge I could recalculate based on the equation given even earlier, but that would answer only one of my two questions).

Would suggest cross-referencing Fig. 12 at the very start of S5.6, when E_Amy and _ideal are first referred to - this does not happen till the third paragraph, a few lines in I’d already given up and started looking for where I could find something to match the text. Also, is E_Amy Fig. 4? I’d suggest explicitly cross-referencing that or whichever is the most appropriate here.

(S7) “In terms of our seamless score” - SEAMLESSNESS, no? and again at the start of S8. To this point the theory has been seamlessness - if this isn’t an error, using the opposite in the conclusions is confusing at best.

*************

a few unresolved cross-references (??) and due to unescaped cite commands

Is steganography sufficiently well known that it can be referred to without an explanation of what it is?

* A number of grammatical errors - run an auto-check - a handful below

- “Additionally, d2,3 contains her imposed satellite groupings.” imposed by, or rather, ON, what?

- fig 6: Boundry -> Boundary

- “such as Amy’s α1, earns a particular start rating” - do you mean “star”?