Aligning Tweets with Events: Automation via Semantics
This is a revised manuscript which has been accepted for publication. This followed an accept pending minor revisions. Below are the reviews for the revision, followed by the reviews for the original submission.
Solicited review by Eraldo Fernandes:
The authors satisfactorily addressed all aspects that I have pointed in my last review. Nevertheless, the paper still needs a carefully proofreading in order to remove some typos, like the ones you can find in the following.
"Fashions shows" -> "Fashion shows"
"which can be then be exposed" -> "which can then be exposed"
Solicited review by David Laniado:
I think the manuscript is ready for publishing.
Solicited review by anonymous reviewer:
This is a much improved version of the paper. Most of the comments have been addressed satisfactorily - in a couple of cases, the authors have instead provided a rebuttal, but it appears there is some misunderstanding about the reviewers' comments. However, these are fairly minor details so I'm prepared to let them go. For example, the suggestion to evaluate the performance of Zemanta was made in order to determine not how good Zemanta is per se (which I agree is somewhat beside the point), but to determine what the effect of errors or missing concepts might be on the overall performance. The point is that unless you have a clear idea of how good the performance of a third party system is, you cannot determine whether it's the best tool for the job, and how much impact on the final result this system might have. I apologise for not making this clear in my original review. Perhaps a sentence or two could just be added to the document that this would be something worth looking at in future.
just a few minor comments remain:
One typo: in the second sentence of the Introduction, "fashions shows" should be "fashion shows".
On page 2, I suggest adding a comma between "to" and "in" in the sentence beginning "Motivated by the need to align tweets...." (otherwise it sounds as if the events are in the paper).
Figures 8 and 7 appear out of order in the paper - it would be better to arrange them so that Figure 7 appears before Figure 8.
In the bibliography, reference 8 has the wrong author names: the authors should be D. Maynard, W. Peters, Y. Li.
Reviews for the original submission:
Solicited review by Eraldo Fernandes:
The authors propose a method to extract sub-event mentions within tweets about a given major event. The major event is specified as a dereferenceable URI and a twitter hash tag. The URI is used to automatically obtain the list of sub-events by a simple heuristic. The hash tag is used to search the tweets concerning the major event.
The main contribution of this work consists in the methods to identify the mentions of sub-events within the set of tweets related to the given major event. The paper presents an evaluation of two methods on a dataset concerning one major event (Extended Semantic Web Conference 2010).
The paper is well written, organized and presented. It also deals with a relevant task and uses a good methodology.
The evaluation is restricted to one event. Since the classifiers are event specific, it would be important to evaluate the method on other events.
It should be clearer along the text (mainly in the abstract and introduction) that the aim of this work is to detect *sub*-events within a given and delimited major event. During the first sections this is not clear and the reader tends to believe that the method identifies general and arbitrary events.
Why do you say in the abstract that the achieved performance is "optimum"? I have no clue what evaluation metric is this optimum result related to.
The F-beta values in Table 2 are inconsistent with the corresponding precision and recall values.
In sec. 5, there are many references to Tab. 6 that should be to Tab. 2.
The number of classes (number of sub-events) can be huge in some cases and turns the proposed classification task very hard. Furthermore the number of examples per class is very limited. This likely is the cause of the poor performance with discriminative methods (SVM, for instance). I think both presented approaches are closely related to TF-IDF weighting for information retrieval. This could be discussed in the text.
Solicited review by David Laniado:
The paper faces the problem of aligning tweets with the events they refer to. More precisely, given a collection of tweets related to an event, the authors propose an approach to correctly assign each tweet to the corresponding subevent; the scenario analyzed is that of a conference with different talks going on.
First, the preprocessing of tweets is described: tweets are represented in a structured format by means of standard ontologies for social Web data and enriched through the Zemanta key extraction API.
The central contribution of the paper is in the proposal and assessment of different machine learning algorithms in order to perform the alignment. Three features are extracted to describe subevents as bags of words, given their official URIs, and leveraging the Web of Linked Data and the Zemanta API. Tweets are also represented as bags of words.
The first algorithm is based on a modified version of k-means, with centroids corresponding to events to be matched, and two distance metrics. The second technique proposed is based on a Naive Bayes classifier, built from the frequency distributions of terms observed in the event descriptions.
The techniques are evaluated on a dataset of tweets about the ESWC conference; in the comparison with a sample of manually labeled data the Naive Bayes classifier outperforms the proximity clustering algorithm, achieving over 70% performance both in terms of precision and recall.
The manuscript provides an interesting experiment that can be relevant for the Semantic Web community, as different machine learning algorithms are applied, tuned and evaluated in the context of Linked Data, to classify microblogging posts.
The paper is well written and structured, and highly readable. The topic is appropriately introduced and motivated; some relevant use cases are also discussed. The algorithms are described in a rigorous way, and the information provided is sufficient to allow for replicability. The publication of a gold standard, based on a sample of manually labeled tweets, offers possibilities for further experiments and comparison of different techniques.
The task of enriching tweets with concepts from DBPedia is delegated to the Zemanta API. More details and discussion about this choice and this step of the process should be provided, to answer questions such as:
* How does Zemanta deal with the specific context of Twitter, where texts are usually much shorter than in blogs, and more prone to typos, misspellings and abbreviations?
* Which additional information is or could be exploited while processing the tweets, in order to improve the performances?
* How are hashtags leveraged?
Given the brevity of tweets, the ability of capturing the topics from those few characters can be a key point for achieving the correct alignment; for this reason I think the possibility of other more ad hoc solutions should be considered and discussed in the paper.
In the field of social tagging, and also of microblogging, other works have been proposed which make use of vector space models to process and disambiguate tags or short texts from social Web data. Literature on the study of emergent semantics from folksonomies could be taken into account and compared with the proposed approach, as the setting resonates with the processing of tweets.
Another point which could be considered and mentioned in the paper, given the importance of time in Twitter, is the possible exploitation of temporal information, i.e. the tweets' publication timestamps and the dates associated to events.
Finally, one potential weakness of this paper stands in the specificity of the explored scenario. The assumption of having a corpus of tweets related to an event seems reasonable, also thanks to the wide usage of hashtags. On the other hand, one could argue that the work risks to be self-referential, as Semantic Web conferences are a very specific context, in which the availability of data in semantic format is straightforward. I suggest to explicitly discuss this issue. Given the generality of the approach, as future work I would encourage the authors to test it in a different context. In this way the results could be more easily generalized to deal with "non-Semantic Web experts".
- page 4: each of which are defined
- page 5: tthe features
- page 5: equation (1) could be made more readable, adding a symbol between p', o' and , and explaining the meaning of G
- page 7: "the class where distance, or proximity, is minimized" (the use of term "proximity" is misleading: if distance is minimized, proximity is maximized)
Solicited review by anonymous reviewer:
This paper describes interesting and relatively novel work in identifying relevant events from tweets and aligning them using an ML-based approach. It does strike me a little bit as a solution looking for a problem rather than vice versa, but the work is interesting nevertheless. In the case of conferences, the usefulness of this alignment is a bit clearer, but I'm not entirely convinced how widespread the problem really is. In many cases, for example, just identifying key concepts in the tweets would be sufficient, without the alignment to LOD. In the introduction, the authors mention that user profiling could be enhanced through such techniques: however, I think simple concept identification in the tweets could deal with this problem sufficiently. Further evidence to show the extent of the problem (outside the SW conference field) would be useful, though not essential.
In general, the methodology seems reasonable, but it would have been interesting to see how the ML approach compares with a standard NLP approach where ontology-based information extraction is performed on the tweets in order to link key concepts to the information in the ontologies. There are plenty of techniques for this kind of work, which should at least be mentioned in the related work. It would also be useful to evaluate separately the different stages of the methodology: for example, how well does the Zemanta-based concept enrichment work? A few words about how this approach resolves the ambiguity issues mentioned, would be useful here (at the end of Section 3).
In section 4.1, I am not quite clear what you mean by "the abstract form of the task"? This section is also a little confusing: while you do refer to a running example, more precise details of the example would be useful here, e.g. showing exactly the set of triples extracted, what you mean by "the surrounding contextual information", examples of the DBPedia concepts extracted, and so on. A little bit more detail in this section, which forms the meat of the work, would not go amiss.
In the evaluation section, I was a little surprised to see that the combination of F1+F2 leads to such an improvement on F1 alone, given that F2 results are so low. Do you have any explanation for this?
The evaluation results look promising, but it would be nice to have seen a slightly more comprehensive evaluation given the lack of existing similar systems to compare with.