Review Comment:
Originality:
This paper describes how to use Linked Data features to generalize the analysis of tweets in one city using trained data from that city or others. It expands on work presented in the 5th Workshop on Semantics for Smart Cities. The approach presented abstracts features using NER and location/temporal mentions, enriching these with LOD for more generalisable features (e.g. from road names like I80 to a road class like highway). The authors present the two-class and four-class variations based on different incident classifications.
The scenario is entirely focused on detecting and classifying incidents from tweets, therefore I believe that the title is too general and should be changed to include incidents and cities in it somehow. If the authors truly want general tweet classification, another large section would have to be written which would describe more about the requirements or aspects of features that would make them more generalisable to other types of tweets. But this would probably involve applying the methods to a new domain, and I think a better fix is to adjust the title to be more specific.
They should describe why the three incident types listed were chosen (volume, more commonly-known, more disruptive (shooting vs. traffic jam), etc.). Also, the authors do not describe what other LOD features could have been used (emergency services, local resources, etc.) for these incidents.
Significance of the Results:
The results show improvement over the baseline using selected feature sets, which should prove useful to other researchers and readers. It would be useful if the author’s could share their datasets and workflows for reproducibility.
One hypothesis that LOD features will in general improve results was not shown to be entirely true, and that more refining of these features are required in future work. It would therefore be interesting to hear the authors' opinions on how the use of other LOD datasets, e.g. GeoNames, could have helped, or if some more formal incident-specific datasets (government or company data) could/should be used, or what kind of generalisable features are best suited for cross-dataset classification.
Quality of the writing:
The paper is fairly well written. It suffers from a number of poor definitions that should be addressed in a revision. Suggested fixes and other things that could do with clarification now follow, by section.
Abstract
--------
Improve generalization -> improve the generalization
of tweet classification -> of tweet classifications
Introduction
------------
time-consuming -> time consuming
Give a quick example of what you mean in terms of city-specific info in the first sentence of paragraph 2.
Explain the term tokens (it may be obvious but worth spelling out what you mean in relation to words, phrases, etc.).
I-90 should probably be I90 in the tweet to match the paragraph text and Figure
in form of Twitter -> in the form of Twitter
present out -> present our
These first involve -> The first involves
our conclusion -> our conclusions
Section 2
---------
requires to -> requires us to
The definitions of entities and mentions are confusing. Can something else be done here (e.g. a table of term, definition, example)? I think one issue is that some examples are given and others aren’t. So seeing an NE for an entity confuses with NEs. What about conceptual abstraction instead of entity?
defined as named entity -> defined as a named entity
helps coping -> helps us with coping
road blocked -> Road blocked (to match earlier text, or else change both to road)
type of named entities -> type of named entity
“URIs for these entities are often missing” … not clear why this is so?
“In our approach, both common and proper…” … state if using NER or DBpedia
tweet shown in Figure 1 -> you mean Listing 1 I think
“unable to detect some of the rather informal temporal expressions in the unstructured texts” … this seems to imply that the large docs it was designed for were (semi-)structured - is this so?
Section 3
---------
The authors have a nice selection of cities. It would be interesting to see if the crime rates / accident rates line up to the data observed!
In Table 1, explain the discrepancy between 1404 No Classes for 2-Class and 390 No Classes for 4.
“tweets were manually labelled” … how many exactly?
“resulted in twenty datasets … split in ten” … you may need to explain these numbers to make it easier for the reader; more later.
as frequent as other -> as frequently as other
tend report -> tend to report
Maybe give some examples of No Class tweets - these are ones that mention an incident keyword but don’t actually refer to an incident, right? Give an example for the reader.
For Table 1 and Figure 1/3, is it necessary to have the same data displayed on both or is either sufficient?
Could Figure 2 show dotted groupings around the baseline and semantic abstraction elements?
DBPedia -> DBpedia
Suggest to add heading to separate our third approach. Change “Semantic Abstraction using Location Mentions and Temporal Expressions” to “Semantic Abstraction using Location Mentions” and add “Semantic Abstraction using Temporal Expressions” just before ‘Third’. That will make it easier for reader when she/he sees “our three SA approaches”.
Visually looking at Table 2, it seems London as the lowest overlapping tokens. Any ideas why? Also seems that there is no higher overlap between US cities…
The definition of features needs to be clearer in 3.3. I am seeing a feature as a Type or Class, yet the text says “for both Types and Categories a large number of features is representative” - it sounds like features are aspects of Types or Classes rather than +TYPE and +CLASS being types of feature. Maybe check the rway this is written.
The last paragraph or two in Section 3 are quite vague. Please re-read and consider making clearer.
For example, is it surprising regarding the discriminative categories since these are the main ones you’d associate with the three types chosen anyway?
When you say “some of the representative features are shared”, what exactly is ‘some’? How many?
“This could be an indicator that these” … what are these? Classes? Types?
Section 4
---------
Table 3, why was N=18 chosen, especially with text referring to items not in Table? Explain the high amount of web related classes.
interest is how the learned -> interest is what the learned
As for SVMs -> Since for SVMs
RandomForest <-> Random Forest
I wasn’t really convinced by the selection of the five algorithms. I think more may be needed to justify. How do these compare to all available in the software?
for each classifiers -> for each classifier
that semantic abstraction -> that Semantic Abstraction
allows to -> allows one to
as non-parametric -> as a non-parametric
Friedman, Nemenyi -> Friedman and Nemenyi
End of Table 4 caption (two classes) is missing
Explain where 500 raw samples comes from, again in simple calculations for the casual reader.
Reference Table 4 in 4.2 paragraph 3?
Figure 4 in the appendix and Table 4 -> Figure 5 in the appendix and Table 7 … please check all cross references as obviously there are issues here.
Again, do you really need both tables and all the Box-Whisker figures? Think about it at least…
The paragraph “Although we dominantly…” is unclear, consider rewording.
neglectable -> negligible
+ALL feature group … +ALL italics
“Surprisingly” … Why, explain?
The text says that SA doesn’t increase performance. LOD does and is a type of SA, so this is confusing - distinguish between aggregate and individual. You may want to explain this and also keep it clear from the already-referred to aggregation across the five algorithm types.
45000 raw- and … explain where numbers come from, how calculated
“differ significantly from the baseline” -> this could be good or bad, but I think you mean good because of lower ranking
“focus on the semantic features +LOC, +TIME, and the^H^H^H +LOD” explain why these ones
You will need to define “Majority Class” as it is kind of thrown in here. Also explain how the values are calculated from the class counts in Figure 1.
“ranked at the bottom” -> “ranked at or near the bottom”
“skewed class distribution” … what is connection to Majority Class?
“in general trucks are not involved more often in incidents” … I assume this is true, but if you make a statement like this you may need to cite proof from an objective source (insurance data, police data?)
equation 5. -> equation 5).
(Equation 6 -> (Equation 6)
cin-ditionally -> conditionally
“of the word “unit”” -> explain, e.g. is it related to distances, speeds, etc. (km)
Wildland Fire doesn’t sound too exotic but Former Member States does - explain the difference
It could be, could show -> sounds vague, “can”?
Section 5
---------
No comments. The authors outline related approaches used to help classify social media using semantic abstractions. These are nicely described and related to work in the paper, describing the main differences.
Section 6
---------
“social media text classification” +for city incidents
|