Localised Annotation of Event Video with User Tweets
Solicited review by Victoria Uren:
The paper presents a novel use of Twitter feeds as real time input that can be used for annotating sports videos. The authors argue that, since standard NLP techniques work poorly on Twitter, an NER method, which employs rules based on background knowledge is suitable. Entities detected are players and events in the ICC world cup. The majority of the paper concerns performance testing of the NER methods with a short application scenario to show how the data would be used for video annotation. Some interesting hypotheses are explored – in particular that exciting real time events (such as a six) are associated with lower retweet levels, because individuals tend to tweet spontaneously.
For NLP methods the work is compared to standard methods (Stanford NER and OpenCalais). The authors should also consider experimenting with Ritter's Twitter specific NER library [1]. If they do not have time to run experiments before the deadline for the special issue they should as a minimum reference the work in the literature review.
[1] http://github.com/aritter/twitter_nlp
Ritter, S.Clark, M & O Entzioni, Named Entity Recognition in Tweets: an experimental study, Conference on Empirical Methods in Natural Language Processing, 2011.
For a journal publication I would expect to see a more extensive literature review. This should cover both the state of the art in NLP for Twitter and video annotation.
A very brief section on the rules of cricket might also be helpful for an international audience – for example it may not be obvious why all the only some of the players are active at any time, or why a 4 and a 6 are significant events. Explanation to a high level will do, don't worry about the intricacies of the LBW rule.
Minor changes to English.
Sec8.2
presence,. -> presence.
Extracting from these short messages are not trivial. -> Extracting from these short messages is not trivial.
Data preparation
slient -> salient
Avarage -> Average (twice)
Tweet volume and Information diffusion
comnputed -> computed
Result (Results is more standard)
Performance test of proposed classificatios… - sentence has no verb
Player detection
Opencalais -> OpenCalais
Player detection with our approach
sorrounding -> surrounding
below Fig 9
roubustness -> robustness
detectthe player's name -> detect the players' names
tweet to timeline alignment
video segments -> video segments based on (join split line)
Detecting clues about the end of an event
mraking -> marking
Related studies
Please supply TwitterStand reference
Conclusion
wether -> whether
who follow and adopts -> who follow and adopt
Solicited review by Bernhard Schandl:
This paper discusses a combination of various approaches that aim to identify entities within twitter streams, in order to derive annotations to segments of a video stream. The idea itself is very nice, and the application use case seems quite reasonable.
The structure of the paper and the overall argumentation line is fine. However, the paper has numerous spelling and grammar errors, so I recommend careful proofreading prior to publication. Further, the presentation style of tables and figures can be improved. Unfortunately, section numbers are missing.
The introductory section gives a very well elaborated overview on the special characteristics of microblogging services, and I consider this the strongest part of the paper. What I am missing in this part is a more detailled discussion of cultural and social aspects -- I assume that the style and characteristics of twitter streams differ greatly between different communities, and it would be interesting to analyze the implications of these differences for content analysis.
The description of the used background knowledge is rather vague -- how has that data been modelled and used? Was there some semantic network used or was it "just" a list of additional terms?
As for the annotation of tweets with entities, the description does (in my opinion) not clearly state what the test users were supposed to annotate. The section describes that test users only used "yes" and "no" to label tweets that mention players -- but how are players then identified? resp. which tweets are annotated with which players? Please give examples here so that the actual annotation is easier to understand.
The actual performance of the algorithm's performance is difficult to judge since the presented numbers are not compared to other works. The numbers themselves are solid but not breathtaking; what I have to criticize is the graphical presentation of the numbers, which is definitively misleading in many cases -- e.g., in Figure 8, the values for 0.86 and 0.89 are actually quite equal, while the height of the corresponding bars differs significantly. Please use a linear, non-cut representation of precision, recall, and f-measure values.
As stated before, the application scenario is nice, but it would again be much more helpful if the authors would present a working example of such an annotation (e.g., an annotated video that can be watched online, or some search facility that allows to retrieve segments of the video based on annotation search). Through such a test, the subjective quality of the discovered annotations could also be analyzed.
Solicited review by Anna Lisa Gentile:
The paper presents techniques to detect Named Entities and significant micro-events from users' tweets during a live sports event. The tasks are performed exploiting (i) linguistic features of tweets (ii) Twitter-specific features and (iii) background knowledge.
The paper introduce small novelties over the work presented in their published work http://ceur-ws.org/Vol-718/paper_17.pdf
The main added work concerns: (i) presentation of results from two standard Named Entity Recognition (NER) tools on the collected tweeter data; (ii) evaluation of different combinations of features for the purposes of NER task; (iii) statistical data about re-tweets as a useful feature for event detection; (iv) application scenarios, in particular semantic enrichment of sport videos.
While (i)-(iii) do not constitute sufficient material for an additional publication over their previous work (http://ceur-ws.org/Vol-718/paper_17.pdf), (iv) could motivate a novel publication if better explored and presented. Unfortunately the task is superficially analysed; e.g. the authors recognise a challenge in finding the correct latency time between the video and the user tweets, but do not explore any technique to automatically and dynamically find the latency value; in alternative they propose an ad-hoc data-driven value.
At page 20 the authors say about the video tagging task: "Results are discussed in the evaluation section", but I cannot find any reference to it in the Evaluation section.
In addition, the abstract of the two manuscript are identical, which is not ideal. The abstract of a paper should highlight straight away the contribution of the work; having an abstract which is the same of an already published one gives the felling that nothing has been added to previous work.
The State of the Art (SoA) section do not present a gap analysis of current work in the field and do not specify how the presented work address such gaps. In particular the video tagging task, which could be the added value of paper, is not even explored in the SoA, making it difficult for a non-expert to judge the novelty and the challenges addressed from the proposed work. One of the reference in the SoA is empty.
Clarity of presentation is poor, in particular, no section hierarchy is present, no numbering (apart from a weird 8.2 numbering for subsection at page 3) and strange formatting and break-lines are repeated in different places (e.g. pg 18, pg 20).
A few typos:
vaery --> very (pg 6)
for e.g. --> e.g. (pg 6)
.92ompared --> .92 compared (pg 13)
sufficent --> sufficient (pg 14)
detectthe --> detect the (pg 17)
Mechanical Turk and CrowdFlower should have footnotes

