A Framework for Real-Time Semantic Social Media Analysis

Tracking #: 1271-2483

Diana Maynard
Ian Roberts
Mark A. Greenwood
Kalina Bontcheva

Responsible editor: 
Andreas Hotho

Submission type: 
Tool/System Report
This paper presents a framework for collecting and analysing large volume social media content. The real-time analytics framework comprises semantic annotation, Linked Open Data, semantic search, and dynamic result aggregation components. In addition, exploratory search and sense-making are supported through information visualisation interfaces, such as co-occurrence matrices, term clouds, treemaps, and choropleths. There is also an interactive semantic search interface (Prospector), where users can save, refine, and analyse the results of semantic search queries over time. Practical use of the framework is exemplified through two case studies: a general scenario analysing tweets from UK politicians and the public's response to them in the run up to the 2015 UK general election, and an investigation of attitudes towards climate change expressed by these politicians and the public, via their engagement with environmental topics. The paper also presents a brief evaluation and discussion of some of the key text analysis components, which are specifically adapted to the domain and task, and demonstrate scalability and efficiency of our toolkit in the case studies.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Paolo Tomeo submitted on 22/Feb/2016
Major Revision
Review Comment:

The paper describes a framework for real-time Semantic Social Media Analysis with an interactive semantic search interface (Prospector), based around GATE, an open source framework for Natural Language Processing. Overall, this framework combines a series of already existing tools. In this paper the authors describe how they used this framework in two case studies: one on monitoring of political tweets leading up to the UK 2015 elections and another on sociological analysis of the representation of climate change in politics and of the reaction of the public to and engagement with this topic.

The work described is quite interesting, and the paper is well written.
However, it describes a framework already presented in another paper published in this journal (V. Tablan, K. Bontcheva, I. Roberts, and H. Cunningham. Mímir: an open-source semantic search framework for interactive information seeking and discovery. Journal of Web Semantics, 2014.) and in other papers as referenced in the paper and in the cover letter. Considering the guidelines for reviewers, I find this paper unsuitable to the definition of 'Tools and System Report', since (a) the changes to the framework since the previous paper do not seem sufficiently substantial to deserve a new paper or they are not clearly described. (b) I am not sure the framework have had significant additional uptake since acceptance of the previous paper.

Therefore, I decide for a Major Revision since I would give the authors the opportunity to describe more clearly what are the increments and to evidence the reasons they are relevant enough to deserve a new publication. I suggest to better describe the components and to make Sections 4 e 5 more concise, since they only describe how to use the framework in two case studies.

Minor suggestion
- Figure 8 is not readable at all.

Review #2
By Tomi Kauppinen submitted on 09/Mar/2016
Major Revision
Review Comment:

The review is structured along the dimensions of

(1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided).

This framework for analyzing political tweets looks very promising for understanding how different political themes co-occur, and for visualizing the themes e.g. on maps. The principles and tools behind making this framework look to be generalizable for other (English speaking) countries, yet the framework itself is quite tailored to UK (am I right?).

The evaluation part of the paper reports about a success of the framework in a few tasks compared to named competitors. However, I wonder what the precision & recall here really tell about the performance. Usually you need to provide full P&R graphs for a reader to check in order to be able to claim about the performance.

For instance, what is the precision at recall level of 10%? Does the framework beat all competitors in all tasks no matter what is the recall level? This is my main concern and basis for the "76" as the overall impression, and for suggesting a major revision (can be minor as well if you have the P&R analysis done properly and can easily produce the P&R graphs).

(2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

The paper is well written, and easy to follow. No comments nor suggestions in this regard.

Review #3
By Achim Rettinger submitted on 27/Mar/2016
Minor Revision
Review Comment:

Summary of content:

This paper presents an open source framework for real-time semantic social media analysis, which is highly scalable and can be used both for off-line processing and live processing of social media data. The framework consists of several components including semantic annotation, semantic search, dynamic result aggregation and visualization. The complete framework was used for two different use cases and provides information visualization interface to show co­occurrence matrices, term clouds, tree-maps and choropleths.


Review dimensions:

- Quality: This paper is of good quality. The authors start with a description of the framework and its components, and then explain the practical use of the tool through two case studies.

- Importance: The tool presented in this paper has its importance since social media is a huge information source that many real-life applications could rely on.

- Impact of the tool: The tool is based on GATE, a widely used, open source framework for Natural Language Processing (NLP) and it can perform all the steps in the analytics process including collection, semantic annotation, indexing, search and visualization. Since social media content has its unique nature, such as fast-growing, highly dynamic and high volume, reflecting the ever-changing language used in today’s society, and the current societal views and sentimental fluctuations of the authors, existing NLP tools have their limitations to deal with social media data. This tool provides numerous components, which are either specifically designed for social media analysis or adapted from previously developed tools for general usage, and integrate these components into a framework.

- Clarity, Illustration & Readability: This paper is well written and easy to follow.


Overall assessment:

Overall, the important contribution of the paper is that it showcases various components of such a big framework with a clear description and citations. Most of the presented work can be reproducible with minimal effort and applied to real­-world problems thus demonstrating its usefulness. The article is well written and showcases almost all the important components used in the framework with enough citations and examples. It can be accepted with minor revisions if the suggestions listed below are considered.


Brief comments and suggestions by section:

- Section 1: Although the authors title this 'social media analysis' in general, this section highlights only one social media platform (i.e. Twitter). It will be interesting to the reader if the section extends problems from other existing social media platforms (e.g. Facebook, Blogs, Reddit etc) and motivates how such a framework can be leveraged to solve issues.

- Section 2: (1) Link to Twitter hosebird should be added; (2) Isn't 50 tweets/second trivial for real­time analysis? (3) No batch processing of tweets? (4) It would be helpful to the reader, if an example of Mimir columns are presented.

- Section 3.2: Showing a query beyond simple textual queries would be interesting.

- Section 6: It would be nice to include an evaluation or user study regarding the discussed scenarios of the analysis of the 2015 UK general election and the investigation of attitudes towards climate change as a whole. It would be easier to read if the results are presented in a table instead of directly embedded in the text.

- Section 7: As a paper on tools and systems, a comparison of described framework with other tools dealing with social media content is expected. As there are plenty of tools for social media monitoring available, the differences of the presented tool in this paper from others should be clarified.


General issues:
- Most of the links to toolkits are embedded into the text. It’s easier to read if they are presented as footnotes.
- A running example of a tweet should be used to understand how each component in the framework can be applied to it.