LODifying personal content sharing

Paper Title: 
LODifying personal content sharing
Authors: 
Oscar Rodríguez Rocha, Fabio Mondin, Carmen Criminisi, and Larent-Walter Goix
Abstract: 
The advent of contemporary mobile devices and their increasing computing power and location capabilities combined with the most innovative web technologies has provided mobile users with new possibilities to share experiences on-the-go. The growing quantity of multimedia content present on the web makes difficult for mobile users to retrieve suitable content. Typically, users looking for interesting content related to their current position or POI, access web engines relying on keywords to describe their ideas. Unfortunately such descriptions are often subjective and retrieval can be ineffective. To address these issues, our platform provides users with an application targeted for modern mobile devices that allows content acquisition and publication. Published content is automatically analyzed and stored on our server with semantic annotations based on users’ context and content, for further semantic search. We describe how and why we migrated from a triple-tags technology to semantics, hoping for related Linked Data.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Decision/Status: 
Reject
Reviews: 

Solicited Review by Fabian Abel:

The authors present a system that allows people to publish and search for social data (tagged pictures). Pictures (tags + title) are processed by means of a semantic enrichment pipeline that exploits services such as GeoNames, DBpedia or Zemanta to map tags to ontological concepts. Given the semantic descriptions, it is possible to perform advanced SPARQL queries on the social data (e.g. list pictures that show a monument near a given location).

The application seems to make nicely use of existing Semantic Web technologies/services such as DBpedia, Geonames or Zemanta and Social Web services such as Facebook or Flickr (pictures can be cross-posted on different platforms). The novelty of the application is fair and the authors should more clearly explain how their application goes beyond the capabilities of SMOB (I assume that the semantic enrichment pipeline is more advanced and does not rely on manual semantic annotations done by the user). Unfortunately, the authors do not evaluate their methods and their system. Hence, the article can therefore probably not be accepted as research article.

Moreover, I have the following concerns:

- link to application: as far as I could see there is no link to the application, i.e. it is not possible to test the application. It would be good to provide such a link that could, for example, point to a supporting Web site

- related work: the discussion of related works has to be extended. For example, regarding the "eTourism field" the authors could have a look at papers authored by Cena et al. http://www.di.unito.it/~cena/publication.html (e.g. related to iCity)

- design rationale: the design rationale are not clearly (e.g. what are the benefits of the system, what does it add on top of SMOB?) -> particularly at the end of the paper where the authors suddenly state that they plan to go towards a federated architecture, the reader gets the impression that the current system design is not well justified.

Minor issues:
- some of the footnotes appear more than once (e.g. genomes)

- page 2: ""¦and their related linked data." -> ""¦and their related linked data (see Section 3 and Section 4)."

- page 2: "in section 5" -> "in Section 5"

- page 2: "The people is getting.." -> "People are getting.."

- Section 2.1.1: What keywords are indexed? (is stopword removal done?, are only those proper nouns mentioned in Sec. 2.2.2 indexed?)

- page 4: "Whenever a content is" -> "Whenever content is"

- page 4: "The title language is initially identified using[3].." -> I would not recommend to use "[x]" as nouns, i.e. mention "Lang Detect" instead

- Section 2.3: one could shorten the SPARQL examples and remove the already known namespace declarations, e.g. on page 7 one could just introduce the new namespace: ""¦ PREFIX oaf:"¦". Would it be good to use a verbatim environment or text for the formatting of the examples?

- Section 3 does not add much value - either one should remove it or extend it

- Figure 2 and 3 should also be referenced from the text

- page 8: "example of an SPARQL" -> "example of a SPARQL"

Solicited Review by Laura Hollink:

General comments:

This paper describes a mobile application for content sharing (mainly images), including semantic tagging and semantic search features. While this application sounds very interesting and useful, the scientific contribution of the paper is not clear. No evaluation is performed, and there are no clear lessons learned.

The authors mention several very interesting problems encountered while designing and implementing the system. E.g.:
-how to distinguish user generated content from automatically inferred content (section 1.1)
-linking unstructured tags to resources in a structured vocabulary (section 2.2)
-disambiguation (end of section 2.2.2)
-linking 'friends' (as in friends in the present application) to their (other) social web accounts. (section 2.2.1)
-finding Points of Interest near the current location of the user, and suggesting them as annotations for user generated content. (section 2.2.1)

However, the paper does not provide any additional insights in these issues. Some of issues are considered out of scope. For example, in section 2.2, the authors say "this paper does not intend to provide a systematic solution to this problem, rather a reasonibly fair approach to the main issue of semantically cataloging user-generated content" about linking tags to LOD. The long description of this process that follows in section 2.2.2 is indeed not new or systematic. In short, the authors detect Noun Phrases, which are matched to a list of candidate semantic resources, which is finally disambiguated by always preferring GeoNames resources over resources of other vocabularies.

Other issues are mentioned too briefly to give the reader any new information (e.g the short section 2.2.1 on the link to social networks and points of interest).

In my opinion, the paper would be a lot better if it were more focussed. I would suggest to focus on one of the issues that play a role in this system, and show the novely of the approach in that respect, including an evaluation, experiment, or at least a clear lessons learned section.

In it's current form, the paper is very much a description of all the work done, including it's intial implementation as a relational database and transformation to an RDF format. The future work is also a serries of improvements to the implemtation, rather than tackling any open issues. While I admire the implemtation effort, this is only valuable in a paper if it is combined with new insights that others can take away.

Specific comments on the text:

-"Enabling multimedia content retrieval based on semantics, increases the effectiveness and quality of the results" (p1). Without references, this is a bold statement. A lot of people have worked on this, e.g. in the Semantic Computing Research Group of the University of Helsinky http://www.seco.tkk.fi/, and the Web and Media group of the VU University Amsterdam http://wiki.cs.vu.nl/web-media/Main_Page .
-"..which retuns local POIs out of a list of information providers" (p4). Which providers exactly?
-"Below is an example of SPARQL query to achieve this result:" (p4). The query is missing. Unless you mean the query 2 pages futher?
-"to allows" (p1). Remove s.
-"The people is getting" (p2). The people are getting
-"The people is getting used to personal content sharing with the inevitable consequence that it gets more and more difficult for them to retrieve it as easy as the would like, so that social networks are getting smaller, thematic and personal." (p2). This is not clear to me. Wouldn't retrieval become more easy as they get more used to it? Or do you mean to say that expectations go up?
-"If we think to Flickr" (p2). If we think about Flickr.
-"tags with wild-free vocabulary" (p2). I don't think wild-free is a word, although I get what you mean of course.
-"However, the main problem of such approach is the ambiguity in which one can incur: the thoughts of a tag creator in a specific situation can be very different of a tag consumer in the same situation." (p3) Is that true? Actually, as far as I know, ambiguity is not a big problem in user generated content, especially not if people are "in the same situation". Have you seen this problem happening with your users? Ambiguity is a big problem if you try to use knowledge sources that contain a lot of senses of the same word. WordNet is notorious for it.
-"it is also possible to create a graphical annotation" (p2). What do you mean by grahical annotation?
-"a unique semantic LOD (Linked Open Data) concept". This is somewhat redundant.

Solicited Review by Knur Möller:

This submission presents the authors' work of including semantic features into an existing platform (please add some kind of reference to your previous work!) for mobile content sharing and finding, in order to improve the process of both. The authors claim that users find existing keyword-based methods for content sharing and finding insufficient, and that semantic annotation with facts (such as links to geo locations or other resources in datasets such as geonames or DBpedia) will improve the situation.

With respect to sharing, a detailed account is given of how location information provided by a mobile device, as well as user-defined textual annotation (title and tags) are analysed and resolved to link to LOD datasets. These annotations are then stored in a centralised service, a Virtuoso-based platform (the authors briefly discuss the possibility for a distributed version of this service as future work). For content finding, the same annotations are then used to create "virtual albums", based on pre-defined SPARQL queries, involving parameters such as the user's current location, a search term, nearby friends, etc. The regular browser and mobile interfaces for content finding are presented. Some related work (very little) is discussed. No evaluation of the platform or the performance of the linking algorithms is given, and there is no way to try out the application.

The paper is very relevant to the topic of this special issue, and the quality of the described platform seems to be decent enough. However, the paper makes a very hurried impression on me, and I don't see it fit for publication yet. The paper is riddled with spelling, grammar and punctuation errors (it might be a personal flaw of mine, but things like that distract me enormously from the actual content of a paper), some of which I find downright sloppy and an indication that the authors did not take enough time to ensure the paper is in a proper state for submission. The same footnotes are repeated several times (e.g., for Flickr, SMOB, DBpedia and SPARQL). A SPARQL query is referenced, but missing from the paper (p. 4). The same query is given three times, with only small additions from one version to the next, needlessly blowing up the size of the paper (p. 6-7). The overlong query on p8-9 contains bits of PHP code, which is not explained or even mentioned in the surrounding text. The related work section is very sparse - I'm sure there is a lot more work on the use of semantics for mobile applications than listed by the authors. 11 references for a journal paper is a bit weak, especially when one is a reference to a PHP source code file.

Another major problem I see is the generic claim that users are not happy with current (keyword-based) systems - this is claimed in the introduction and the "problem statement" sections, but no evidence is given. Have you made a study? Do you know of a study that supports this claim? Are the many others arguing the same? What is more, since no evaluation of the now semantically enabled system is given, how do we know that this "problem" is now solved, or at least the situation improved? You basically just claim that "semantics are better". I sympathise with this point of view, but our community needs to make a bigger effort of showing that this is actually true. Some statements are quite fuzzy, such as "The creation of a domain ontology is fundamental [..] and can be greatly helpful." (p3)

In conclusion, I find the paper promising, but not ready for publication yet.

Details:

- in general, you are very arbitrary about capitalisation: some words that should be capitalised (proper nouns such as "the Web") are written in lower case, others are needlessly capitalised (such as "user-generated content" or "named entity recognition")
- "content" is not a countable noun, as far as I know. You cannot have "a content", "two contents" or "each content". If you want to count, use a construction like "piece of content", or "content item"
- I suggest to format code examples, URIs, CURIs, etc. in a monotype font such as Courier
- always put a space before reference brackets such as [1]. Don't do this[1].

p1: s/makes difficult/makes it difficult
p1 (abstract): POI - please resolved acronyms when they are first mentioned
p1: "web engines" - what's that? Do you mean Web services?
p1: s/based on users' context/based on the user's context - several times in the paper
p1: "triple-tags" - what's that? something like Flickr's machine tags? Please explain or give a pointer to an explanation.
p1: s/offers web users/offer Web users
p1: s/multimedia content creation process/the multimedia content creation process
p1: s/problematical/problematic
p1: for a very specific claim, your reference [1] is a whole book - please be more specific
p1: s/based on semantics, increases/based on semantics increases
p1: s/a content/content (see above)
p1: s/to allows content acquisition/to allow content acquisition
p1: "For accessing content users rely on keywords to describe it." - this sentence is confusing, since you are writing about a solution to the problem that keywords are not enough (I know this is about the previous version of your platform, but it's still quite confusing in the way presented here)
p1: "interfaces for mobile and web" - I have a problem with this distinction: mobile interfaces are Web interfaces, are they not? The difference seems to be between traditional browsers (for devices with large screens) and small, mobile browsers?
p2: s/their related linked data/its related linked data
p2: in the overview, you jump from Sect. 2 to Sect. 5 - what about 3 and 4?
p2: s/The people is getting used/People are getting used
p2: s/as the would like/as they would like
p2: s/social applications, is to/social applications is to
p2: s/brand new/new
p2: s/performing/performant
p2: s/think to Flickr/think about Flickr
p2: "wild-free" - what does that mean?
p3: "ambiguity in which one can incur" - I don't understand that sentence
p3: s/creation of domain ontology/creation of a domain ontology
p3: "Even though the latter is challenging." - That sentence is incomplete.
p3: s/semantic offer/semantics offer
p3: s/query mean/query means
p3: s/choice, resides/choice resides
p4: "well-more" - another word I don't understand
p4: reference [3] - you are quite arbitrary about what you cite as a backreference, and what as a footnote. E.g., this reference is a PHP source code file?!
p5: s/lemmas detection/lemma detection
p5: reference to eTourism - where does this come from? It doesn't seem to have any relevance to the paper.
p5: "joyness" - another strange word
p5: s/analysis , each word/analysis, each word
p5: s/previously-computed/previously computed
p5: s/Evri entity resolver/the Evri entity resolver
p5: s/serice/service
p5: s/DBpedia query/the DBpedia query
p5: s/of its lookup service/on its lookup service
p5: s/Geonames graph/the Geonames graph
p5: "the single ontology" - what are you referring to here?
p5: SPARQL^21 - this footnote comes a bit late...
p6: s/DBpedia resolver/the DBpedia resolver
p6: s/semantic-based/semantics-based
p6: s/Give users/Giving users
p6: s/context, means/context means
p6: s/and that is what/. That is what
p6: s/Organizing dynamically/Dynamically organizing
p6: "can be significantly improved" - how?
p6: "relying on inference capabilities" - how? SPARQL is agnostic to inferencing.
p6: s/user generated/user-generated
p8: "further enhanced with linked data coming from the semantic web" - I thought your "semantic query mechanism" already uses data from the SW?
p8: s/an SPARQL query/a SPARQL query
p9: "et al" - why do you italicise this?
p9: s/Garcia et al[7], has/Garcia et al [7] have
p10: s/there's/there is
p10: Sects. 6.1 - 6.3 - I find this a bit over-detailed for a future work sections. Skip the technical details, say what you want to achieve in general terms. What's the main message here?

Tags: