Eye Tracking the User Experience - An Evaluation of Ontology Visualization Techniques

Tracking #: 770-1980

Bo Fu
Natasha Noy
Margaret-Anne Storey

Responsible editor: 
Guest editors linked data visualization

Submission type: 
Full Paper
Various ontology visualization techniques have been developed over the years, offering essential interfaces to users for browsing and interacting with ontologies, in an effort to assist with ontology understanding. Yet, few studies have focused on evaluating the usability of existing ontology visualization techniques. This paper presents an eye-tracking user study that evaluates two commonly used ontology visualization techniques, namely, indented list and graph. The eye-tracking experiment and analysis presented in this paper complements the set of existing evaluation protocols for ontology visualization. In addition, the results found from this study contribute to a greater understanding of the strengths and weaknesses of the two visualization techniques, and in particular, how and why one is more effective than the other. Based on approximately 500MB of eye movement data containing around 30 million rows of gaze data, we found evidence suggesting indented lists are more efficient at supporting information searches and graphs are more efficient at supporting information processing.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Aug/2014
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Generally, I am okay with the changes made by the authors in response to the previous reviews. However, I would have much preferred if more of the responses actually resulted in changes in the manuscript. I feel a little brushed off by some of the responses. But overall I think the updated version is acceptable.

Review #2
By Roberto García submitted on 21/Aug/2014
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Review #3
By Emmanuel Pietriga submitted on 24/Sep/2014
Review Comment:

This revision addresses my most important concerns. Some comments on authors' answer follow.

3.1) Reviewer Comment: I need to be convinced that these different elements (saccades, fixations,
scanpaths, pupil dilation, angles) and their mapping can indeed be treated orthogonally. And if
there is no strong evidence that they can be treated orthogonnaly, this does not nullify the results
of this study, in my opinion. But then this limitation has to be acknowledged and discussed, and
the claims made have to be toned down, more clearly indicating that the observations made
_suggest_ something rather than they _show_ or _demonstrate_ it.

Author Comment: We have rephrased our claims as advised in the revised submission. It should
be noted that Goldberg et al.’s research in eye tracking and how to interpret eye movement data is
the state of the art in this field. These measures and how they should be understood [33-35, 6] are
the current leading standards used by the community as well as our paper. To date, there is no
evidence to suggest these analyses/interpretations are unreliable. If the reviewer is aware of any
eye tracking research that suggests otherwise, please point us to the relevant papers. In this study,
we aim to provide a basis for future research in this direction and we hope to inspire
methods/analyses that challenge or build on the results/approach discussed in this paper.

Reviewer answer to author comment: I am much happier with the new wording. Still, I would have liked this aspect of the interpretation of results to be discussed, even if briefly, in the discussion about the validity of the experiment in Section 6. I am not sure I made my point clear in the first place. I am not questioning earlier findings by Goldberg et al. or anyone else. What I am saying is that since those earlier results about different measures where obtained _in isolation_ on _different tasks than the task considered here_, it is scientifically wrong to generalize them as a whole and draw general conclusions about what is observed here on this basis alone. It is not a problem of these earlier analyses being reliable or not, it is a problem of making them say something that they do not say without validation. Again, I am not claiming that the conclusions you were drawing are necessarily wrong. I am only claiming that you cannot make such strong statements as you made in the earlier version.

3.2) Reviewer Comment: A side comment w.r.t pupil dilation is that as stated in [49], pupil size
as seen by the eye tracker depends on the person's gaze angle. Did you apply the calibration
method of [49] to compensate for this? As the two visualization techniques lay out data on screen
in a very different manner, thus having a significant impact on where users are looking, there is a
strong potential for confounding factors here if the distortion is not eliminated prior to analysis.

Author Comment: Calibration is a must in any experiment involving an eye tracker. The
calibration process specific to the Tobii 2150 we used is discussed in section 3.3. A recording
session only begins after the participant has calibrated her/his eyes.

Reviewer answer to author comment: I know that eye tracking requires calibration. I was asking if you made the specific calibration that compensates for gaze angle.

3.3) Reviewer Comment: I would like to better understand the rationale beyond the choice of this
particular network visualization technique. Why is this one representative? Is it more relevant to
choose a representative technique, or one that tries to optimize some set of criteria (readability,
edge crossing minimization, screen real-estate consumption, ...)?

Author Comment: As discussed in section 3.2, the graph visualization uses a force directed
layout, which minimizes edge crossing, supports drag-and-drop to improve readability of labels
and tailored use of screen space. Please also see comment 1.3 above.

Reviewer answer to author comment: force-directed layout minimizes edge crossings if only considering node-link diagrams with straight edges. Other algorithms can do better when allowing non-straight (curved) edges. Supporting drag and drop does not improve label readability. It is just a bad way to address the problem by allowing manual local relayout. Still, I find the rationale for using a force-directed layout strategy good enough at this point.
3.4) Reviewer Comment: Please provide more information about the statistical tests involved
(ANOVA, post-hoc pairwise comparisons, others?) and report effect size.

Author Comment: We have included effect size in the revised submission. Please also see
comment 2.4 above.

Reviewer answer to author comment: OK.
3.5) Reviewer Comment: Please also put error bars in Figures 8, 10, 12, 14, 16 and 18. It would
also be nice to have charts illustrating success rate.

Author Comment: These figures have been updated in the revised submission.

Reviewer answer to author comment: OK.
3.6) Reviewer Comment: "500MB of eye-tracking data" and even "30 million rows of data" does
not tell us much. This certainly does not belong to the abstract, and if the authors want to keep
this information in the paper (first paragraph of Section 5), I would suggest providing a measure
of dataset size that "speaks" more to the average reader, like the sampling frequency and
duration of eye tracking sessions. Whether this represents 500MB or 1TB of data is totally
irrelevant to me, to be honest.

Author Comment: Additional frame rate information is added in section 3.1 in the revised
submission. As discussed in section 5, recordings vary between 10 minutes to well over an hour.
We would like to argue that the data size speaks to the average reader as well, since it provides an
understanding of how much data is generated per participant/recording, and the volume of the
data processed and analyzed in this study.

Reviewer answer to author comment: Nice to see this additional information added. But now, I am sorry, saying that you collected 500MB of data does not speak to the average reader. It is totally meaningless if you don't say how much bytes is one row on average. It does not say anything about the amount fo information per sample, and it does not say anything about the sampling frequency. I really fail to see how the average reader could extract meaningful information from the fact that your dataset size is 500MB.
3.7) Reviewer Comment: how do you segment the continuous stream of saccades and fixations
into scanpath sequences?

Author Comment: As discussed at the beginning of section 4, ClearView generates basic data
such as the timestamp and duration associated with each fixation. As saccades are the quick eye
movements between successive fixations, the difference between two fixations’ timestamps is the
saccade duration. Since a scanpath is the complete saccade-fixate-saccade sequence, its duration
is determined as the total duration of fixations and saccades, as discussed in section 4.1. Please
also see footnote 15 for additional engineering information.

Reviewer answer to author comment: OK.
3.8) Reviewer Comment: "To automatically process this large volume of raw data, we generated
and ran a script on them.": This is not telling the reader much... Most data analysis requires
running scripts to process the data. What is the purpose of this particular script?

Author Comment: The measures discussed in section 4.1-4.3 are not automatically generated by
the eye tracker/ClearView (only basic information is provided as discussed at the beginning of
section 4), hence the code included in footnote 15 solves the engineering problem this study has
to overcome first.

Reviewer answer to author comment: OK.
3.9) Reviewer Comment: it makes little sense to use pixels as the unit to discuss areas and
distances in the visualization. Clearly, what matters here is the physical distance between, and
size of, elements on screen. It should be expressed in centimeters or inches. Expressing it in pixels
makes it dependent on screen resolution, which can vary dramatically from one screen to another
(depending, e.g., whether you have a standard monitor or a HiDPI one).

Author Comment: Likewise, physical distances expressed in centimeters or inches can also vary
dramatically from one screen to the next, e.g. the same visualization on a 21” monitor vs. on a
30” monitor. In the eye tracking community, saccade lengths are typically reported in pixels
accompanied by the monitor size & resolution - in our study, we used a 21.3” TFT with
1600*1200 resolution as discussed in section 3.1.

Reviewer answer to author comment: if that is the convention in the eye-tracking community, then ok. I agree that the information is equivalent.


Continued from previously numbered comments:

If by generalization and unscientific, the reviewer is referring to measures such as

  • more fixations indicate less efficient search;
  • longer fixation durations indicate more difficulty in processing information; etc.,
  • then we would have to disagree. It must be noted that these measures of search and processing are indeed validated and tested methods, please see details of validation in [35]. These measures have become a standard in the eye tracking community, please also see [6, 33, 34] for additional information.

    Different eye tracking manufacturers use different technologies for calibration, how an eye tracker tolerates gaze angles consequently varies depending on the specific physical hardware. What applies to the hardware/configuration/machine used in [49] is entirely irrelevant to Tobii 2150. We used the calibration procedure prescribed by the hardware manufacturer in our study as required by Tobii.

    We do not claim sample frequency can be inferred from data size. Frame rate (discussed in section 3.1) is the direct measure for sample frequency. In the same way the total number of participants is useful when describing experimental setup, we feel that the total data size is relevant and useful - particularly for those who have experience with Tobii eye trackers.

    3.1) when justifying measures based on fixation count and duration in the first drafts, the authors were citing papers that studied these measures _in isolation_ on _different tasks_ than the ones considered here. From this, they were drawing hypotheses about the experiment considered here that were generalizing those findings to tasks that are different from those that were involved in the cited papers. Such generalizations have to be justified in one way or another. Either through empirical studies or some sort of reasoning (harder). In the paper, I am missing evidence that such generalizations can be made: i.e., that "more fixations indicate less efficient search" no matter the task, data and visualization considered; that "longer fixation durations indicate more difficulty in processing information" again no matter the task, data and visualization considered; etc. Is this evidence contained in those references? If so, why aren't they stated more explicitly?

    3.2) Acknowledged.

    3.3) Yes, knowing the total number of participants to an experiment is useful. That figure speaks to all readers. In what way is " the total data size is relevant and useful"?