Visualizing Ontologies with VOWL

Tracking #: 892-2103

Authors: 
Steffen Lohmann
Stefan Negru
Florian Haag
Thomas Ertl

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Full Paper
Abstract: 
Visualizations can be very useful when working with ontologies. The Visual Notation for OWL Ontologies (VOWL) is a comprehensive and well-specified visual language for the user-oriented representation of ontologies. It defines graphical depictions for most elements of the OWL Web Ontology Language that are combined to a force-directed graph layout visualizing the ontology. In contrast to related work, VOWL aims for an intuitive representation that is also understandable to users less familiar with ontologies. This article presents VOWL in detail and describes its implementation in two different tools: ProtégéVOWL and WebVOWL. The first is a plugin for the ontology editor Protégé, the second a standalone application entirely based on open web standards. Both tools demonstrate the applicability of VOWL by means of various ontologies. In addition, the results of three user studies conducted to evaluate the comprehensibility and usability of VOWL are summarized. They are complemented by latest insights gained from an expert interview and from testing the visual scope and completeness of VOWL with a benchmark ontology. The evaluations helped to improve VOWL and confirm that it creates comparatively intuitive and usable ontology visualizations.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Jan/2015
Suggestion:
Major Revision
Review Comment:

I did a positive review of an earlier version of this paper when it was submitted to EKAW 2014.

The work presented is original, as it proposes a novel graph visualisation of OWL ontologies. The related work section is dense and features a lot of relevant papers, it has however some flaws related to a lack of clarity. For instance, the authors insist on visualizing "complete ontologies", but should be more precise on that term. There lacks a table clearly organizing the multiple related works between technics, vis tools, objectives and tasks, data visualized and completeness of ontologies. Moreover, some statements are quite void without example, e.g. p3 « there are also some conceptual limitations and incompatibilities when using UML class diagrams for the visualization of ontologies » or « Such RDF visualizations of OWL are not only hard to read but also fail to adequately reflect the semantics of the OWL constructs ».

As far as significance of the results is concerned, it seems to me that the proposal is quite mature, is interesting and has the potential of becoming widely used. However, if I regret but am ok with the fact that the studies remain qualitative, and with a few users (6 on the first, 5 on the second), their presentation is insufficiently described, at least for the description of the experimental settings. The tasks, the durations, the data collection etc. should be described with precision. Also, there should be a minimal discussion on related work on ontology visualisation evaluation, and the second user study (5.2) should be more justified: why did the authors target expert users if VOWL is aimed at lay users? The expressiveness / completeness evaluation using OntoVibe is interesting.

The quality of writing is fairly good. The paper is well structured. However, aside from aforementioned remarks on related works and user studies, the main problem to me is that it remains somewhat confusing on the differences between VOWL, VOWL1 and VOWL2. There still remain unnecessary allusions to VOWL1 or VOWL2, when the clear topic of the paper should be VOWL as it is now (= VOWL2). I understand that the difficulty could come from the fact that part of user study was on VOWL1, but then it should not be used in this paper apart from an historical presentation of the origins of VOWL2. My advice would be to cite VOWL1 only in the related works, and only focus on VOWL2.

All in all, though describing valuable work, my feeling I that the paper can be much improved at the clarity level, so I would tend to ask for a major revision 1/ with better presentation of related work : 2/ with VOWL1-related sentences sorted out ; 3/ with a better presentation of the user studies.

Other remarks
- I do not understand the sentence: « This is different in Graf- foo [23] which aims for an easy-to-understand notation for OWL diagrams similar to VOWL. However, it is intended to be used in diagram editors and therefore rather related to the idea of UML-based modeling than to the visualization approach that is followed by VOWL. »
- there is no orange in table 2, while you mention this color in the text
- you claim that you use UML "specialisation" notation for subclassof, this is false: you do use UML "implement" relation (dashed line)
- fig2 is too small
- p7 "a demo of WebVOWL was presented at EKAW 2014 [49]"
- p7 "Like VOWL 2, ProtégéVOWL focuses": it seems that you compare a specification and an implementation. Strange.
- p8 : question : are there some inferences possible within the browser, or is it just a JSON graph representation ?
- p10 "previous work has compared" : cite the reference
- p10 : "None of the participants in these studies had extensive prior knowledge about ontologies or other Semantic Web technologies, SO they could rather be regarded as lay users."
- p10 : "The second comparison [51] focused ON the WebVOWL"
- p10 : 3rd paragraph of 5.1 is not easy to read.
- p11 : I am still unclear about the protocol: when did you provide the questions?

Review #2
Anonymous submitted on 10/Feb/2015
Suggestion:
Minor Revision
Review Comment:

The paper follows on from previously published work on VOWL, and provides more detail about the two implementations ProtégéVOWL and WebVOWL, in addition to a new evaluation with a set of (domain) expert reviewers.
The paper is fairly easy to follow, and does a good job of illustrating a new contribution to visualisation of ontologies. Value over existing work is clearly discussed, as well as additional avenues of use, and the paper concludes with pointers to further work.

Overall, I think the paper makes a good contribution to the field, the comments that follow are predominantly to do with points in the presentation where I found myself looking for additional information to answer a question raised by the discussion at that point. I suspect a fair bit of this is making sure that new work is presented without repeating too much of previous work, and/or not presenting so much about what appears to be fairly extended work that the paper then becomes a bit too broad. Or, also, the trap we all easily fall into - that the reader does not have all the context the authors do, so what may be obvious to the latter simply is not to the former.

The other area I would draw attention to is the discussion of the evaluations (detail below).

************

It is not till fairly late in the paper that the EKAW paper that this is extended from is first referenced. I only just noticed, on my third read, that this is actually the first footnote on the first page! But the footnote is not actually referenced from the text. I actually expected this reference at the top of page 2, when the previous version of VOWL and the related papers are first brought up.
Also, "a demo of WebVOWL will be presented at EKAW 2014" - as at the time of submission EKAW 2014 had already taken place.

I remain a bit confused - WebVOWL is described as "a standalone application entirely based on open web standards". Is it a web-based tool - name implies this, or a standalone application as described? This is actually finally answered - on p.10. Might be useful to clarify this earlier - a "web application" isn't quite the same as a "standalone application".
It is also later referred to as "WebVOWL, a responsive web application …" - what exactly does "responsive" mean, how is it measured? Is there any reliance on a network connection, what were the specs of the machines the tests were carried out on?
In a top end research lab, the equipment available far surpasses what the average, non-tech end user will have access to, especially at work. Among others, depending on what they typically do, they simply do not need too much power. FYI, I make this comment based on experience working with, among others, aerospace engineers doing a decent amount of data crunching, where we eventually had to provide alternative machines (security was not the issue here). Point is, the description may be valid, but it needs to be qualified.

A bit pedantic, but "The evaluations helped to improve VOWL and confirm that it creates comparatively intuitive and usable ontology visualizations." - "usable" is so broad that I'm not sure it contributes much after saying "intuitive". What is it usable for? Who (as in user type) is it usable to?

"Many approaches visualize ontologies as graphs, which is a natural way to depict the structure of the concepts and relationships in a domain of knowledge." - playing devil's advocate here, because I do not disagree with the point completely - BUT, what makes graphs a natural way to depict this structure? I could cite half a dozen articles that justifiably say the opposite, or that some other structure works better.
On the same point, and probably more importantly, there are very valid arguments against using force-based layouts - in fact, the authors raise one toward the end - using the word "appealing" to refer to them is debatable.
These arguments become clearer further on in the paper - would be useful to put in a sentence or two here justifying the points, with relevant references.

p.3 - "However, the ontology is converted into the NodeTrix structure for the visualization, making it difficult to get an impression of its global structure and topology." - how is NodeTrix responsible for the issue here?

"developers were given more freedom in the parametrization of VOWL " - does this mean that it can be extended?
Related to this - "VOWL does not specify a particular scaling method for the circle radius, but proper results will likely be achieved with a logarithmic or square-root scaling in most cases. " - what is this claim made based on? Again, does this mean the end user can extend to do this?
The answer appears to be yes, emphasis on "appears".

Doesn't pointing to multiple instances of owl:Thing increase clutter - one of the things VOWL is supposed to avoid?

It's nice to see consideration for use in monochrome. However, there is no evidence provided to back this up - was this explicitly evaluated with end users? Or tested using some other verifiable method? I can see the argument with the text labels - which is a fair point, but this also contributes to clutter (a point noted during the evaluation). My question is also whether the current colour scheme works sufficiently well in monochrome that without the text labels it would still be usable.

The authors refer to the use of Venn diagrams in the description of the tools, and again wrt comments by participants in S5.2. However I struggled a bit with 5.2.5 because to that point there were no examples in any of the snapshots. It may be useful to point forward to Fig. 5 when this is first introduced and especially in section 5.2.5.

EVALUATION

It would be useful to provide a brief (one sentence) description of the user types in the previous evaluations. For instance, on p.6:
"The representations of these elements were considered intuitive by many participants of the user study that compared VOWL 1 to the UML-based visualization of ontologies [55]." - it's impossible for me to interpret this properly without knowing what types of users these were. I know the answer is in the paper referenced, but there are 65 in all … I should be able to read this one without having to go to each to get extra info.
Actually, this is finally provided in section 5, so alternatively point forward to this section.

I have a bit of a problem with the report of previous evaluation(s).
5.1 does a good job of telling me about the users - what I was missing earlier. However, at the end of the section I'm not sure I really get what the results were, beyond that it was compared with a set of (named) tools, and that it came out looking good. This section needs one of two things, either the results are presented in more detail, or a much higher level summary given and the reader simply pointed to the previous paper with the detail. And the focus kept on 5.2, with this as background - see also point below on relating the two sections.

The cover letter refers to a new evaluation with expert reviewers. However, I didn't find a specific reference to this effect in the paper itself (apart from in the abstract) - I guess this is section 5.2? If so would be useful to state that this is a follow-on to the previous one (5.1) reported and give the information (in the cover letter) in the paper itself about why it was considered a good path to go down.

"While the pick- and-pin feature was generally thought of as useful, one participant even asked for such a feature on his own." (5.2.1) - don't understand this. Also, what was this additional feature? Also, I really don't follow the argument in 5.2.2 - multiplication would increase clutter - so this seems a bit contradictory. And if they reported that they wouldn't want to answer the question for which this was relevant- even more confusing. Further, what was the reason for the one exception?

Wrt my earlier comment about force-based layouts being natural or appealing, what were some of the other layouts requested by participants (apart from the mind map - strictly speaking that's not too different).
Wrt to the restrictions to the visualisation of set operations at the end of S.5, what is the expected impact on use? Was this evaluated with the participants?

The two reports in the evaluation section ARE related to each other. However, I do not see any discussion to that effect. This is important - a key aim of VOWL is to support end users who would not normally work with ontologies, or understand their structure in any great detail. The second evaluation with experts is actually very good in that it picks up additional requirements for ontology visualisation that the former may not, and therefore helps to ensure that these would be available to all users. However, at the same time, feedback from domain experts, unless they've training also in HCI/usability, will not pick up on what would be difficult for non-experts to work with.
It would be useful if 5.2 backward references key issues raised in 5.1 during the discussion and/or an additional (relatively short) discussion included as a summary at the end of the evaluation section, showing the value of the 2nd evaluation to these (casual) users.

Also, while it's perfectly acceptable to present just qualitative evaluation results, for a journal paper you really need to justify why this is so. And whether or not the results can be seen to be representative of the target user population. Off the top of my head I would say one reason might be small numbers in each study. However, that alone is not enough. Another might be the use of expert reviewers. But it is not for me to surmise, but for the authors to clarify.

MINOR POINTS

p.5, col 2, top - "The recommended color scheme has been designed in accordance with the general guidelines: For instance, and inline with …" - is there something missing here - the first sentence doesn't end. Also, should be "[in line] with"

CITATIONS & REFERENCES

OntoViz is mentioned but never referenced.

Ordering at the start is weird - appears to list URLs only, but then there are a few others scattered within the rest of the references (which are subsequently listed alphabetically).

Check that capitalisation is maintained for acronyms, e.g., [11] "OWLGrEd: a uml style graphical notation and editor for OWL 2."
And also consistently named, e.g., RDFgravity in [62] but "RDF Gravity" in text.

LANGUAGE & PRESENTATION

OWL === Web Ontology Language - wd suggest "OWL (Web Ontology Language)"

"an increasing number of people in modern knowledge societies get in contact with ontologies." -> "… COME INTO contact with"

"users would not discover them as flawlessly as the permanently displayed elements" (p.12) - "flawlessly" here is a bit strange, maybe "effortlessly" or "easily"?

"5.3. Benchmark of the VOWL Visualization" -consider "BenchmarkING" or "Benchmark TESTING" - as otherwise the header implies that "VOWL Visualization" is the benchmark, rather than that it is being tested.
Ditto "… in contrast to ProtégéVOWL at the time the benchmark was performed…"

Overall, well written and easy to read. A number of minor corrections needed - should be picked up by an auto-check and proof-read.

Review #3
By Luca Gilardoni submitted on 11/Feb/2015
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper describe a system to visualize ontologies, its two implementations - one as a Protege plugin and one as a web application based on D3 - and a user evaluation study. The WebVOWL application is particularly interesting, given that the approach based on a json structure independent from the specific owl target ontology structure and parsers and the ability to display in a web browser makes it widely applicable.
Two critiques has been raised on preliminary review of the same paper - requiring discussion. The first one noted that the design approach of providing users with a complete view instead than a stepwise approach, while it has its own merits and advantages seriously risks to impede effectiveness with real life ontologies of even moderately large size. This seems to have been addressed by discussion of filter mechanisms. The other open issue reported was about the need to store location information from previous sessions so to mantain a fixed visualisation structure over time - a highly relevant one to make the system ore effecive in real usage. This was already acknowledged but it is left to future implementations.
The paper has been non trivially revised and somehow restructured for better readability. Results, albeit limited in number of evaluators, are presented and discussed in depth.


Comments