Capturing Place Semantics From Users’ Interaction on the GeoSocial Web
Revised manuscript after a reject and resubmit. Reviews for the previous version are below.
Solicited review by Vera Hollink:
The paper describes a method to extract from geo-folksonomies activities and sentiments related to places. The paper is clearly written en the problem is clearly described. The main weak point of the paper is that it lacks evaluation. The method is applied to tags from the site tagzania.com, but the resulting classifications and ontology are not evaluated. Moreover the choices made for the method are not motivated well and the formulas used to describe the method contain several mistakes.
Main points:
- The first two Sections are convincing. The only thing I miss is an indication of applications in which the collected place activities and sentiments could be used.
- Section 4: The method is not described at an appropriate level of detail. In my opinion the SPARQL-queries used to collect the lists of place types and activities from existing ontologies (Figures 4 and 5) are not needed. On the other hand, the method to identify place names in tags needs more explanation. How are tags classified as place names? Which clustering method and distance/similarity measures are used?
- Section 5: the method is applied to the tagzania.com data, but the resulting classifications and ontology are not evaluated. The validity of the relations in the ontology should either be judged or it should be shown that the use of the ontology improves some task.
Detailed comments:
- Section 4.3: "Tags in the folksonomy are matched against the keyword lists of place types and place activities". Is there a minimum number of times a tag needs to be used with a location to be included in the ontology? If not, is there not a lot of noise stemming from users who use unrelated tags (e.g. the tag "walk" for a museum, when this users happens to see the museum during his walk through the city)? Also, I would like to see a discussion of how the authors deal with homonyms and synonyms or how these could be dealt with in the future.
- Section 4.4: As it is, Formula 1 does not make sense. Should it maybe be: "P(x|y) < 1 and P(y|x) = 1)"?
- The first line of Formula 2 appears to be incorrect as well. I guess it should be: "P(x|y) >= t and P(y|x) < t"
- Section 4.4 last paragraph: provide a formula or pseudocode to formalize this part of the method. This is important, as it is this piece that contains something new, while the rest of 4.4 is taken from literature.
- Section 4.5 Formula 4 and 5: Fi and C are not defined. Probably Fi should be Ti and C should be S.
- Section 4.5: the lengthy pseudocode to calculate the sentiment score can be replaced by a simple formula as it just takes the average over the calculated Bayes scores.
- Section 4.5: explain why are there three values calculated for each place: P(positive), P(negative), and P(neutral)? The obvious thing would be to train classifier that predicts the score (a real value between -5 and +5) directly.
- Section 5, Figure 7: 1) provide labels for the axis. 2) explain why the lines do not follow a Zipf curve as is usually observed for word frequencies.
- Section 5: the example application does not include the sentiments. Please explain what the role of the sentiments could be in an end-application.
- Figures 1, 3, 6, 8, 9, 10, and 13 are not visible when the document is printed.
Solicited review by Fabian Abel:
A. Summary of content of the paper:
The paper reports about learning place semantics from folksonomies (in particular activities and sentiments associated with places). The authors present an ontology for describing places, detail their framework for learning place semantics (includes stop-word removal, stemming, extraction of place types, extraction of activities, mining sentiments) and present a lightweight Web application (including SPARQL endpoint) that demonstrates the semantically enriched (geo-)folksonomy.
B. Summary of review:
The paper is written and structured very nicely. It fits the scope of the special issue very well. Related work is cited extensively and described well. Demo/showcase (including SPARQL endpoint) is available online. Main concerns are the following:
i. the place ontology is not integrated with existing standards such as WGS84 (http://www.w3.org/2003/01/geo/wgs84_pos#). The added value of the new schema is unclear, e.g. what can be modeled with the proposed ontology that cannot be modeled with existing schemata?
ii. the approach (accuracy of the tag clusters, accuracy of sentiment classification, accuracy of identification of activities associated with a place, ...) is not evaluated. Given datasets like DBpedia which relate activities to places or by conducting a user study (e.g. using mechanical turk), it would be possible to evaluate the accuracy (btw: what would be the baseline strategy in such an evaluation?).
Some parts of the paper need further descriptions, for example, the ontology, the clustering, sentiment analysis algorithm and some of the figures (see detailed comments below). Overall, it thus seems that the paper needs further improvement so that it can be integrated into the special issue. As the paper neither features an evaluation nor reports on significant scientific results, it might rather be considered as "Reports on tools and systems" (see http://www.semantic-web-journal.net/authors#types )
C. Detailed comments:
Related work: related work is cited extensively and described very well. A more detailed description on how the work presented in this paper differs from the cited works is missing yet and should be added to the paper.
Some of the descriptions need to be extended. For example:
- Section 3 needs to be extended. The place ontology in Figure 1 is not sufficiently described. For example, why are properties such as hasName or longitude necessary? Why are existing vocabularies not re-used? What are positive/neutral/negative scores and why are three different properties necessary? Moreover, it would be good to publish the ontology (http://cs.cardiff.ac.uk/2010/place-ontology# is not resolvable).
- Section 4.1: the two-step clustering is not 100% clear. Would it be good to add further description about it?
- Figure 8-10 are not explained in the text. Although those diagrams are quite self-explanatory, it would be good to add some descriptions, e.g. why is Buckingham Palace just broadly classified as Place?
The pipeline for capturing places includes stemming. Is stemming really needed? (e.g. different concepts may be reduced to the same stem) Is there some evidence that stemming allows for better clustering in the tagging domain? Also some evidence for the usefulness of string similarity would be great (e.g. New York, NYC and The Big Apple have low string similarity even though they refer to the same place).
Section 5: the title of this section is miss-leading. Unfortunately, there is no evaluation done. However, having a demo application that visualizes the enriched dataset is nice. Would it be feasible to also visualize the sentiment information? Identifying the sentiment of tags related to places seems to be one of the core contributions of the paper, i.e. it would be good to demonstrate how this information can support the user while browsing the semantically enriched folksonomy.
D. Minor comments:
- Page 1 Introduction: pointers to GeoNames, OpenStreetMaps, Tagzania and Wikimapia (and Flickr and Twitter) could be added
- Page 2 Introduction: resources.Results > resources. Results
- Page 2: section X > Section X
- Page 2, Related work: reference for "Flickr has more than 91 million photos"
- Page 2, Related work: reference for "GeoNames, currently containing around 10 million geographic names" would be nice
- Page 3, Related work: coined by [4] > coined by Camara et al. [4]
- Page 3, Related work: last sentence of related work should be rephrased. What is meant by "classifying semantics"?
- Page 3, Pointers to WordNet and OpenCyc could be added
- Page 3, Section 4: in details below. > in detail below. (?)
- Page 4, Section 4.1: materialise > materialises
- Page 5, Section 4.2: from ontology > from the ontology
- Figure 5 could be shortened
- Page 6, Section 4.4: geo-golksonomies > geo-folksonomies
- Page 6, Section 4.4: the description of Eq. 1 does not match with the actual formula. should it be P(x|y) <1 and P(y|x)=1?
- Page 6, Section 4.5: Maybe it would be better to use "posts" instead of "folksonomies" to describe micro-blogging data.
- Page 7, Equations 4/5: F_i >T_i (?) Otherwise, what is F?
- Page 7, Section 4.5: I would consider to remove Equation 3. In the end, Eq. 5 is the important one as it describes how the authors classify the sentiment, given a set of tags. What is C in Eq. 4/5? It would be good to mention already in the description of the equations that a sentiment is attached to each place, i.e. the features used to classify the sentiment of a place are the tags of a place cluster assigned by any user to any resource, right?
- Page 7, Table 1: is it always true that WOW is classified as +4? What if WOW refers to "World Of Warcraft"?
- Page 7, Section 4.5: the algorithm is not described. Some sentences of explanations would be good, e.g. what is tagSet (which tags)? Why is it useful to compute the average over users instead of tag assignments (I assume that it is done to lower the influence individual users can have)?
- Page 8, Section 5: in figure 7 > in Figure 7
- Page 8, Figure 7: labels of x-axis and y-axis need to be added
- Page 8, Section 5: in section 4 > in Section 4
- Page 8, Section 5: the last paragraph list some statements such as "the quality depends on". Is there some quantitative evidence for these statements?
- Page 10, Figure 13: the type Park_Type is assigned twice to Tourism_Activity
- Page 11, Reference 5: names are somehow broken
Solicited review by Bettina Berendt:
The paper describes a scheme for generating a different type of place gazetteer. The idea is to combine geographic information with activities at that place.
The general idea is very nice and useful; but the work appears to not be mature enough yet for a journal publication. The evaluation is not satisfactory. In addition, the language of the manuscript is not always clear. I suggest the authors keep on working on this interesting topic and prepare a journal submission when the work has become more mature.
Detailed comments:
Content:
The semantic model itself is a mixture of straightforward and confusing. What is a score / what does it mean? How can a score be independent of the activity? (Wouldn't one say: Wembley is a good place for tennis, but a bad place for basketball?)
Spatial clustering: wouldn't this idea require a good sense of boundaries as well?
Equation (1): What is the reason for setting this probability to 0.8?
p.7, below equation (5): What is being learned here, if the association word - sentiment is already fixed by the lexicon?
More questions about the learning algorithm: Does this really work, without any context? Do users give only positive or neutral or negative opinions? Can you provide any evaluation results that show the accuracy (etc.) of this sentiment-mining approach?
p.8: You report descriptive statistics about how your algorithm classifies the tags. But how accurate is this classifier?
p.9: The "ontologies" shown in Fig.s 8-10 are small - and probably selected from the whole set of results - but don't appear to make much semantic sense. For example, how can "Natural" be a parent type of "Museum"? In what sense are "Group activities" and "Picnic activities" entities of the same type (parents of "Travel activities")? In what sense is "Signs" an activity?
Why is the association with "bank" an inaccuracy? Pls explain.
The application may indeed be useful "for the purpose of evaluating the framework", but unfortunately you do not provide such an evaluation (or even an evaluation concept).
Unclear formulations:
p.2, last para: Do you mean to say that these ontologies define the info types in a hand-crafted way, as opposed to your approach that learns them? Pls clarify to make your contribution clearer.
p.3, start: introduce Relph before you use the word.
p.3, description of dbpedia/wikipedia: unclear relationship to what you wrote before ("detailed geographic information" is not per se place semantics). Examples of Wordnet and Opencyc "descriptions" would be helpful.
p.5: Why should stemming avoid misclassification? Is stemming needed to avoid misclassification? Please clarify the formulation and explain.
Language:
p.2: "sentimental reflection" - I don't think this is what you mean. (see http://www.merriam-webster.com/dictionary/sentimental) On p.11, "sentimental" should be replaced by "sentiment".
p.3: strive -> strife
p.4: this inaccuracy materialise -> rephrase and correct grammar
p.5: extract ... from ontology -> extract ... from the ontology
p.5: "Follows are" -> rephrase
Pls check for spelling.

