Using ontologies to model human navigation behavior in information networks: A study based on Wikipedia

Tracking #: 477-1672

Authors: 
Daniel Lamprecht
Markus Strohmaier

Responsible editor: 
Werner Kuhn

Submission type: 
Full Paper
Abstract: 
The need to examine the behavior of different user groups is a fundamental requirement when building information systems. In this paper, we present Ontology-based Decentralized Search (OBDS), a novel method to model the navigation behavior of users equipped with different types of background knowledge. Ontology-based Decentralized Search combines decentralized search, an established method for navigation in social networks, and ontologies to model navigation behavior in information networks. The method uses ontologies as an explicit representation of background knowledge to inform the navigation process and guide it towards navigation targets. By using different ontologies, users equipped with different types of background knowledge can be represented. We demonstrate our method using four biomedical ontologies and their associated Wikipedia articles. We compare our simulation results with base line approaches and with results obtained from a user study. We find that our method produces click paths that have properties similar to those originating from human navigators. The results suggest that our method can be used to model human navigation behavior in systems that are based on information networks, such as Wikipedia. This paper makes the following contributions: (i) To the best of our knowledge, this is the first work to demonstrate the utility of ontologies in modeling human navigation and (ii) it yields new insights and understanding about the mechanisms of human navigation in information networks.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 07/May/2013
Suggestion:
Major Revision
Review Comment:

This paper is based on previous research that models human navigation behavior in information space through decentralized search. It extend this approach by introducing external ontologies (in this case biomedical ontologies) as the background knowledge. A small case study based on human subjects testing is conducted to compare the similarity in navigation behavior between simulated navigation in a Wikipedia network and observed user navigation. The statistical analysis is thoroughly conducted along several evaluation metrics, and the ontology based navigation simulation outperforms random navigation, i.e. behaves more similar to human navigation behavior in information system.

Nevertheless, I feel that there are some serious conceptual issues in the design of the study, and that there is some limitation to the usability and relevance of the results.

My major concern relates to the user task in the study. The scenario assumes that a user cannot exactly remember the target term (e.g. a disease), and would therefore start the navigation at a more general Wikipedia article. However, as the first attempt of the study showed, with a search term not understood (i.e. their semantics and position in an ontology unknown), local information provided through text information on links of a Web site is not helpful for navigating towards the target site. If, however, a user remembers or understands a target term, then I would assume that the user simply searches for this term (either in Wikipedia or a search engine), so get pointed towards that Web site directly, and would not go through the hassle of hopping from page to page. Thus, the task to navigate towards a single target page is hardly used in practice, at least when the exact search term, like in the conducted study (Pneumonia, etc.), is known. So this task poses a Catch-22 problem: A user would have to navigate first to an unknown target to learn about it (i.e. its name). This is however, only possible, if he/she already knows where in the ontology this term is located (or at least how it is spelled), which would make the navigation unnecessary. Yes, one could probably simulate different user groups in medical information systems, as the authors state, but would anyone apply such a search in real world situations? This needs to be clarified.

The navigation algorithm is (too) simple.
a) For example, humans learn to some degree when they navigate through navigation networks (i.e., they update their ontology/mental map), but this component is apparently not present in the simulation model, at least I could not see it. Yes, there is backtracking, but, new links are not being added or removed between objects in the ontology, if corresponding information is discovered during the navigation event. If there is a good reason for not including a learning component this should be explained.
b) Also, right now the decision rule is deterministic. Human search and decision behavior has a stochastic component. This should be addressed.

Also, the model is currently limited to one application ontology. This may work if navigating towards a target with a simple concept, like a disease, but not for information spaces (e.g. the Web) in general, e.g. when the target node (i.e. a Web site) needs to satisfy several constraints (e.g. through attribute values). An example could be a search for a ski resort within 200km from Graz which has at least 20 km of ski slopes and can be reached by public transportation. Navigation towards such a Web site, if possible, would involve several ontologies. If this cannot be handled with the provided approach, this is limiting. It has not been explained, which kind of relations can be found in the sample ontologies (ICD-10, etc.), and whether all kinds can be used in the navigation algorithm. It looks like ICD-10 is strictly hierarchical with IS-A relationships, but oftentimes there may also be properties of objects involved, which could be used as background knowledge. Please clarify. In fig. 1 I see only nouns, but there may be more involved than this.

Some minor comments:
- fig. 5 right column: Where do the curves for random walks come from (dashed lines)? I would assume that they only occur in a simulation. A user would not purposely perform a random walk. Please clarify.
- You bring up a lot of advantages of the ontology-based approach on page 8, but do not implement, illustrate, or test any of them, e.g. how to extract different types of relations from an ontology for background knowledge, and how that would be used to simulate navigation behavior. More detail or an example would strengthen the paper.
- http://wikipediamaze.com/ did not work when I tested it (server error)
- Please check the grammar, e.g. “This paper extends this application by a using ontologies…”

All in all this paper is a logical extension of previous work, but I cannot yet see its relevance towards a better understanding of human navigation in information spaces.

Review #2
By Carsten Keßler submitted on 11/Jun/2013
Suggestion:
Minor Revision
Review Comment:

The paper analyzes how useful ontologies are to simulate human background information when navigating through information networks. Based on a scenario where a user is looking for a particular disease in Wikipedia, the authors demonstrate how their OBDS approach can model her navigation behavior, using three different medical ontologies. ODBS is evaluated in great detail against random walks and a decentralized search using randomly generated ontologies as background knowledge. This evaluation shows that ontologies can provide valuable background information and indeed improve the modeling of user navigation.

The paper is generally well written and structured, and the whole research is well motivated. My main concern with this paper is that this approach should work better, the closer the used ontology matches the user's (or group of users') actual background knowlegde. If this is true, it ODBS should perform less well for heterogeneous user groups with very different levels of background knowledge (a mix of doctors, nurses, and lay persons, for example). This is also somehow implied at the bottom of page 9, where commonly known diseases are selected manually in order not to present the participants with diseases they had never heard of. This issue is not discussed anywhere in the paper and it seems especially relevant with respect to the small number of participants in the user study.

Minor issues:

- How did you choose the maximum number of steps given on p.10? Is there any justification for 20/40?
- C. elegans may not be familiar in this community (it wasn't for me), so a short explanation would help
- Figure 1a has a green arrow in the legend for the shortest path, but there is no shortest path in the graph
- In Figure 1, ICD-10 is used, but only introduced later in the text
- In Section 4.3, can you explain why certain combinations perform better than others?
- AFAIK, *participants* in a test should be called just that (not subjects)

There are a number of typos and small grammatical issues across the paper. Please use a spell/grammar checker on the text.

Review #3
By Tobias Weigel submitted on 03/Jul/2013
Suggestion:
Minor Revision
Review Comment:

This article presents the results of a comparative, quantitative study including a user study and synthetical simulation of navigation behaviour in information networks. It suggests the use of domain ontologies to model exemplary users' behaviour. Overall, the article is well written and original, the research questions are interesting and relevant and the conclusions drawn from the study are sound. Minor weaknesses exist in the quantitative analysis section which however do not harm the overall conclusions.

Detailed comments:

- the authors show good knowledge of related work and relevant literature
- paper is well-written and actually a nice read. Complex passags are enriched with more details or examples.
- it must be noted that the results must be taken with restriction to well-organized knowledge areas such as biomedicine where there is a strong taxonomy in the background and actually part of common user's knowledge. Navigation patterns might look different for search in less structured knowledge domains. This should be mentioned in section 3.
- typo on p. 8 ("server")
- last sentence of 3.4 on p. 8 should be "representative of expert knowledge"; also it then reads to me as if the MeSH terminology will be representative for journal article reviewers and I am not sure if this is the intended point here
- 4.3: "2.45 resp. 2.49" - this might be statistically insignificant, lacking any information on confidence intervals
- the percentage numbers given at the end of section 4 lack a statement on population size (total number of clicks by human users) and confidence intervals and might be statistically insignificant, particularly where percentage numbers were given in the 1-3% range. Any conclusions about home button usage given these quantitative numbers seem doubtful.
- p. 14 end of left column: "can be applied on of Wikipedia" - superflous word 'of'
- p. 14, on RQ2: "certain properties" - this should be more specific here. Do the paths actually match or do they only match regarding average length etc.?
- future work: future user studies should also increase the number of test subjects, which was extremely small here