The effect of generalization on user interpretation of topical overviews

Tracking #: 1333-2545

Alex Olieman
Gleb Satyukov
Emil de Valk

Responsible editor: 
Guest Editors Social Semantics 2016

Submission type: 
Full Paper
The demand for tools that enable interactive exploration of social media streams and other user-generated content has inspired much research in recent years. A common approach in this area starts by extracting information from user contributions, which is subsequently linked to a semantic knowledge base. In this way, entities and concepts that are mentioned in the content are given canonical representations, which serve as the basis to aggregate and compare social media activity over users and over time. While this leads to representations of social media content that can be effectively used behind the scenes of an application, the suitability of these overviews for user interaction has yet to be investigated. We have conducted an experiment to investigate whether the presentation method that is used to show a topical overview of documents to users has an effect on users' ability to interpret such an overview. More specifically, we test for an effect of generalizing topics to a higher level of abstraction on the ease with which users make sense of topical overviews. We found significant effects of this treatment on user accuracy, interpretation diversity, and task duration. Overall, the results indicate that generalization negatively effect users, but we were also able to identify several cases in which generalized overviews were more user-friendly.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Claudia Mueller-Birn submitted on 12/May/2016
Review Comment:

The article “The effect of generalization on user interpretation of topical overviews” deals with the representation of topical overviews of documents to users by evaluating the accuracy, the diversity, and the time users need to process the provided information. The article is organized into five parts, the fist part with reviewing related research on providing topical overviews of documents on one hand, and user interest profiles on the other. Then, the authors explain how they generated the topical overviews, and the central part of the article is a user study of which the results are presented and discussed.

Overall, the topic of information design is very useful today. However, the final results of this research are quite trivial: (1) users benefit from being able to see specific topics (2) users do seem to find the generalized profiles easier to interpret when the clustering algorithm performed well. Why is this research useful? How can these results inform the design for tools that “that enable interactive exploration of social media streams and other user-generated content”? A poorly written article that lacks a rigorous argumentation causes these questions. Additionally, the article was submitted to the special issue of “Mining Social Semantics on the Social Web”. The “semantic” part of the article consists of an SPARQL query.

The article would improve by another round of proofreading, and the areas of improvement are detailed following. This article is not ready for a journal submission. I would recommend submitting is to a workshop instead.

* What is your contribution and where is it needed?

* “Interactional aspects do matter for real-world applications,…” – This is one of the main issues of this paper. What is the exact use case? When does user need to interpret the output of clustering, recommendations or search functionality? What is the application context? Concrete examples would be every useful.
* The question: “Creating overviews necessarily involves abstraction or generalization, but which degree of abstractness is suitable for a given task?” does not really fit to the goal: “We therefore conducted an experiment to test whether the incorporation of generalized topics into topical overviews has an effect on users who were given a task in which they needed to interpret the overviews.” – If the degree is in question, then different degrees (not only two) should be provided. However, it is still quite difficult to understand the concept of “overview” in the context of the paper. It is a very general term, and it needs a clear definition (as many other terms need too).

Related work
The related work section reads more like a list. It is not clear, why the authors review the introduced research and how this research informed their research. Additionally, more information was needed for understanding the described concepts, for example, “spreading activation with two pulses led to the best results” and “category-based method”. I would recommend extending the concepts that are used later.

Generation of topical overviews
The missing linkage to existing research was done in the “flat presentation” section. Why here?
However, in the “generalized presentation” section, the elaboration of the presentation methods is entirely missing. The description is very artificial and contains some mistakes such as a union instead of an intersection, a wrong index (e in e), and the URL “snorql”. The authors neither provide an evaluation of the defined metrics nor an explanation on the used visualization (a nested list). Thus, the whole user study is based on the hypothesis that content is not related to presentation” Salganik, Matthew J., Peter Sheridan Dodds, and Duncan J. Watts. "Experimental study of inequality and unpredictability in an artificial cultural market." science 311.5762 (2006): 854-856.

User Study
* The description of the user study lacks many-needed information. For example, that is the concrete task; a user wants to carry out? The existing description: “Does the addition of generalized topics to overviews help users to perform a task in which they need to make sense of the underlying user-generated content?”. The three hypotheses used concepts that are not described: accuracy and diversity. A definition of the context is needed. These definitions are given in the results section, but this should be introduced earlier.
* The argument that a task-directed experiment is needed since it has not done before is not sufficient. What insights provide task-directed experiments as opposed to others?
* The used expert finding scenario in the area of journalism should be explained and described much earlier in the paper. However, the whole setup does not seem to be very realistic since the search system is very simplistic.

Please change the caption in Figure 2. The colors present the categories and not the classes (there are not defined classes). The results section is compared to all other section well described. However, for example, it is not explained why the Shannon entropy is a good measure of the diversity index. Please move the whole line of argumentation in a separate section before defining the hypotheses.

The discussion shows another major issue of this study – the influence of the user interface design (the nested list) on the study results. This should be taken into account from the beginning, for example, different designs should be considered.

Review #2
Anonymous submitted on 23/May/2016
Major Revision
Review Comment:

The paper describes an approach for enabling interactive exploration of social media streams and other user-generated content, extracting information from user contributions and linking to a semantic knowledge base (wikipedia). In this way, it is possible to highlight entities and concepts in the content, that can be used to aggregate and compare social media content.

In the paper the authors conducted an experiment to investigate whether the presentation method of documents topics has an effect on users’ ability to interpret them. More specifically, they tested the effect on understanding of generalizing topics to a higher level of abstraction.

The authors found that generalization negatively effect on user accuracy, interpretation diversity, and task duration.

The request of user-friendly topical overviews of user-generated content in social media is very relevant in the community. The topic is interesting and in line with the journal aims.

The paper is well-wrtitten and easy to follow. The motivation of the work is clear, and the methodology followed clearly explained.

The related work section is complete and cover the most relevant related works in the literature.

The experiment is performed as a task-based evaluation. This choice should be evaluated better (it is not a real justification that is rerely used in the literature).

The features of the samples can be detailed more: age, gender, profession.., features that may affect the results. It is intereset to provide some insights about this topic in the conclusion. I think that the ability to interpret abstract concepts depends also on personal cognitive features and educational level, kowledge on domain.

A comparison of the authors' finding with respect of the similar finding from related work should be added.

Limitations of the approach sould be stated.

Review #3
By Trevor Collins submitted on 28/May/2016
Major Revision
Review Comment:

- Summary

This paper introduces work done on the generation of topical overviews of documents and user interest profiles, a method for generating topic overviews is presented, and a user study comparing the use of two-level nested topic lists with the use of flat topic lists for completing a manual profile classification task is discussed.

- Originality

The paper draws on previous research to propose an approach to topic generalization, which is an incremental step on a path of established research in the area. The user study provides an original contribution in the form of an experimental study that provides empirical evidence on the suitability of topic generalisations as a means of presenting overviews of profiles compared to flat lists.

- Significance of results

Issues of fair comparison...

The study compares the use of flat lists of topics (i.e. control condition) and two-level hierarchical lists (i.e. experimental condition). The hierarchical lists were initially presented as an interactive closed (single-level) list of generalized topics using an accordion. As noted in Table 1, the control condition lists ranged from 5 to 94 topics, whereas the experimental condition for unexpanded lists ranged from 2 to 32 generalized topics. When expanded the generalized topics would include the list of topics, therefore, the largest expanded list in the experimental condition would be 126 (i.e. 94 topics + 32 generalised topics). Arguably the length as well as the content of the topic lists could have an impact on how they were used.

It is currently unclear what use was made of the interactive accordion to explore the profiles (e.g. the number of topics expanded) by the participants. The optional addition of using an expandable list of topics may affect the participants’ perception of the task and the consistency of experience within the experimental condition. This data (i.e. number of topics expanded and the resulting topic list lengths) should be reported and considered in the discussion of the findings. It would also help to explicitly state the number of topics that could be expanded at a time (e.g. one or more than one) and if there were any options available to expand or collapse all of the topics (e.g. expand/collapse all button).

The comparison between the responses given in the control and experimental condition to the three statements (introduced in section 4.4 and presented in Figure 3), could also be discussed with regard to the participants actual actions in each condition. Where the generalized topic lists were expanded, did the use of expanded lists have an effect on the task or not? The stated hypotheses do not mention the use of expandable lists. Arguably, to compare the suitability of a list of topics with a generalised list of topics, the two lists should be static. The study design varied the level of abstraction, form of interaction (i.e. expandable or not), and the number of topics. What impact do the two confounding variables (i.e. expandable lists and number of topics) have on the target phenomena (i.e. manual profile classification)?

The work draws on theories explaining the concreteness effect from sentence comprehension, however, the study task does not involve sentence comprehension. It would strengthen the justification of the approach if there could be a clearer explanation of how sentence comprehension tasks map to profile classification tasks from topic lists.

- Quality of writing

The paper is well structured and well written. The following suggestions are to help clarify and improve the presentation…

Include the source citations for dual coding theory (e.g. Paivio 1986, 1991) and context-availability theory (e.g. Bransford & McCarrell, 1974; Kieras, 1978).

In section 4 there are two hypotheses for the third variable of interest in the user study (C) rather than three (as in A and B). Clarify why there are only two, or (as with the other variables) include the hypothesis that the generalization topics could decrease task duration (H2).

Figure 1 includes an illustration of the experimental condition, could an example of the control condition also be included for comparison (e.g. at 50% of the current figure size, two examples could be displayed alongside each other)?

The opening two sentences of Section 7 - Future Work seem to be contradictory. It would help to clarify what is meant by “the clustering algorithm doing a good job” – this implies that there are variations in the quality of the topic generalizations that confound the findings.

Typos – Section 7 third paragraph: ‘evaluation on an social platform’  ‘evaluation on a social platform’; and ‘investigate possible correlation between specific type of’  ‘investigate a possible correlation between a specific type of’.