Reviewed

This category lists all reviewed submissions; for papers under review please visit the <a href='http://www.semantic-web-journal.net/category/tags/underreview'>under review papers section</a>.

Sensitivity Analysis of Non Instance Based Learning Approach for Ontology Alignment Using SSFPOA

Paper Title: 
Sensitivity Analysis of Non Instance Based Learning Approach for Ontology Alignment Using SSFPOA
Authors: 
S. Jayaprada, S. Vasavi, P. Bala Krishna Prasad
Abstract: 
The invent of internet and Web have paved way to information sources belonging to same domain to be distributed that are structurally (to some extent) and semantically heterogeneous. In order to achieve semantic interoperability within these information sources heterogeneity has to be solved which exists at various levels such as at data, operating system or due to hardware heterogeneity. Many methods were proposed to solve data heterogeneity problem using ontologies. In this paper we considered ontology alignment as data mining problem and solved using machine learning based classification approaches using our compound semantic measure SSFPOA. Six different tests were made and performance measures such as precision, recall, accuracy, f-measure and overall are calculated Sensitivity Analysis of each of the approach is calculated by varying the number of metrics and performance of each individual metric is analyzed in order to verify on, does propagation of similarity value after each matcher improving or not. Test results (Simple mappings) proved to be better when compared with existing approaches.
Submission type: 

0

Responsible editor: 
Decision/Status: 
Reject
Reviews: 

Solicited review by Agnieszka Lawrynowicz:

I recommend rejection of the paper for the following reasons:

1) The paper is written in a very careless way. The language is very poor. There are plenty of typos, gramatically incorrect/unfinished sentences that make it very unclear in many places. This is unacceptable for a journal paper.
Some examples:
"APFEL [20] it is based on the general observation that alignment methods like QOM [18] or PROMPT [23] and extracts additional features by examining the ontologies for overlapping features, including domain-specific features."
"A new compound measure SSFPOA[29] has been proposed by us which uses 12 different matchers to find semantics."

2) It is unclear from the paper what the authors did, and why, what are the contributions of the paper.
The "SSFPOA (Semantically Similar Frequent Patterns extraction using Ontology Algorithm) for extracting and clustering semantically similar frequent patterns", called "measure" in the paper is never properly introduced. Instead of very vaguely describing it at the end of "literature survey" section, it could be better to describe it properly in a preliminaries section.
What does it exactly mean, the step 5 in the method: "5. Cluster the mapping results in the range of 0-1"?
There are also other things left without an introduction, e.g. the test cases.
It is unclear what are the goals of experiments, nor it is clear what the results mean (e.g. Figure 5).

3) Instead of "using ontology", "Ontology is a logical system that…" it would be better to use "an ontology" that would mean an engineering artefact, and not a subdiscipline of philosophy.

In summary, technical soundness and presentation make the paper impossible to be published in a journal.

Solicited review by Jerome David:

This paper presents an analysis of the performance of several supervised learning methods used with a similarity measure (SSFPOA). Compared learning methods are SVM, Bayesian networks, and multilayered perceptron.

In general, the paper misses some important details to be understandable. Results of other matchers (fig5) does not agree with those provided by OAEI.

The SSFPOA measure could be more precisely presented in this paper. It is also not clear which are the measures evaluated in the charts (1to11, 1 to 3, etc.) On Figure 5, what are the differences between SSFPOA and compound metric ?

The training step should be carefully explained. If the learning process has been made on the 3xx tests, it is obvious that results will be good. The paper does present details about this important step of supervised learning.

In order to analyse if learning methods are interesting or not, a comparison with classical alignment extraction strategies would be welcome: threshold, maximum weight matching, stable marriage, etc.

The definition of F-measure could be simplified. There is no need to introduce such a complex and generic notation.

Since reference alignments between 3xx ontologies are not provided, the paper should explain how they are made.

On OAEI 2011, 3xx tests are not used anymore (there is no results with these tests). It is said that these alignments are not perfect and are here only for comparability reasons with previous years (see: http://oaei.ontologymatching.org/2011/benchmarks/). How the authors can find results of PRIOR+, ASMOV, etc given that they are not provided in OAEI 2011? Furthermore, if we look to the results of previous years, results of other matchers given in the paper does not correspond to those provided by OAEI in 2010, 2009, 2008. They are perhaps those of 2007, but if we look to next OEAI, results have been enhanced a lot.

According to the OAEI policy, the paper have to compare its result with all the participants of OAEI 2011, and not only some of them (see: http://oaei.ontologymatching.org/doc/oaei-deontology.2.html).

All this reasons call into question the validity of the presented results and analysis.

Solicited review by Kate Revoredo:

The paper proposes to evaluate the use of a compound measure (SSFPOA) to solve ontology alignment problem through the use of data mining techniques. The evaluation is done by comparing the SSFPOA with seven other approaches in the literature. Although, the results show an improvement in F-measure of the proposed approach over other seven approaches, it is not clear what are the measures aggregated by SSFPOA and it is seems that a significance test was not considered, leaving doubts about the actual improvement.

Moreover, an evaluation of different similarity measure were performed using 3 classifiers (Bayesian network, Neural Network and Support Vector Machine). Four ontologies were considered for this evaluation and the results were presented using Precision, Recall and Accuracy. It is not clear the goal of this evaluation: verify
which classifier performs better? It is mentioned that 12 "matchers" are considered, but which one concerns the SSFPOA? Why F-measure was not used? A significance test was not considered.

Experimental methodology must be well defined and applied. For instance:

- No explanation is provided for considering 80% for training and 20% for test.

- What is the structure of the dataset given to the learning algorithms? What is the role of the cluster column in the learning?

- An explanation for the chosen parameters configuration for each algorithm must be provided.

- Figures are shown in different scales (e.g. Precision for 301 vs 304 and 302 vs 303).

- Significance test must be used.

Text must be completly revised, specially the English. In many situations it is hard to understand what the authors mean. Some general comments concerning the text:

- The structure should be revised. Section 2 is very confusing. For instance, page 2 mentions "...test case 303 Vs 304 and m(Proceeding, Porc, =, .36)...301 Vs 304...", without proper explanations. Moreover, Section 2 should focus on approaches based on data mining, since they are the ones related to the approach evaluated on the paper.

- There are citations missing (e.g. Machine Learning, decision tree, Neural Network, Support Vector Machine, Bayesian Network) and for others I suggest a review. For instance:

Ontology --> Gruber, T.R., Toward Principles for the Design of Ontologies Used for Knowledge Sharing. International Journal of Human and Computer Studies, vol. 43,issues 5/6. pp. 907–928 (1995), instead of [33]: www.wikipedia.com

Ontology alignment --> Ehrig, M., Ontology Alignment: Bridging the Semantic Gap. Springer (2007) or
Euzenat, J.; Shvaiko, P. Ontology Matching. Springer (2007).

Information Retrieval --> Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008 instead of [32]: Venkat Gudivada, Vijay V.Raghavan, William I Grosky, Rajesh Kasanagottu, "Information retrieval on the world wide web", IEEE Internet Computing Sep 1997 1089-7801/97.

- I also recomend a review in the text regarding the use of terms such as alignment/matching/mapping. For that considers Ehrig, M., Ontology Alignment: Bridging the Semantic Gap. Springer (2007)

- The bibliography entries are incomplete, without standards and some of them do not represent the state of the art.

- Figures and tables are inadequate. The text mentions tables and figures, but what are presented are not exactly a table or figure (e.g. Figure 1, Table 1 and 2). They should be revised.

- Decision tree is confused with Bayesian network (Page 4 --> Table2).

Hide Reviews: 
no

The Digital Earth as Knowledge Engine

Paper Title: 
The Digital Earth as Knowledge Engine
Authors: 
Krzysztof Janowicz, Pascal Hitzler
Abstract: 
The Digital Earth [Gore 1998] aims at developing a digital representation of the planet. It is motivated by the need for integrating and interlinking vast geo-referenced, multi-thematic, and multi-perspective knowledge archives that cut through domain boundaries. Complex scientific questions cannot be answered from within one domain alone but span over multiple scientific disciplines. For instance, studying disease dynamics for prediction and policy making requires data and models from a diverse body of science ranging from medical science and epidemiology over geography and economics to mining the social Web. The naive assumption that such problems can simply be addressed by more data with a higher spatial, temporal, and thematic resolution fails as long as this more on data is not supported by more knowledge on how to combine and interpret the data. This makes semantic interoperability a core research topic of data-intensive science. While the Digital Earth vision includes processing services, it is, at its very core, a data archive and infrastructure. We propose to redefine the Digital Earth as a knowledge engine and discuss what the Semantic Web has to offer in this context and to Big Data in general.
Submission type: 

6

Responsible editor: 
Decision/Status: 
Accept
Hide Reviews: 
no

EventMedia: a LOD Dataset of Events Illustrated with Media

Paper Title: 
EventMedia: a LOD Dataset of Events Illustrated with Media
Authors: 
Houda Khrouf, Raphael Troncy
Abstract: 
An ever increasing amount of event-centric generated knowledge is spread over multiple social services, either materialized as calendar of past and upcoming events or illustrated by cross-media items. This opens an opportunity to create an infrastructure unifying event-centered information derived from event directories, media platforms and social networks using the RDF data model. EventMedia aims at creating such an infrastructure that requires seamless aggregation and integration of disparate data sources, some of which overlap in their coverage. In this paper, we present the EventMedia knowledge base composed of events descriptions together with media descriptions associated with these events and interlinked with the larger Linked Open Data cloud.We describe how the data has been extracted, converted, interlinked and published following the best practices of the Semantic Web community.
Submission type: 

5

Responsible editor: 
Decision/Status: 
Minor Revision
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Solicited review by Erik Wilde:

I like the paper in general, but the paper should spend a little bit more time explaining the most important design decisions. Right now, the paper very factually states what was done, it it does not really reflect very much why it was done that way, and what other possibilities were investigated. For readers, it would be very valuable to learn more about the design decisions at various stages (dataset selection, unified model selection, mapping strategies, use cases envisioned, ...), and to learn about the choices that the authors were making, and why they made them.

Language is ok, but the paper could need a final round of proofreading and editing.

Solicited review by Amit Joshi:

In this paper, authors have introduced an EventMedia dataset which contains media related event information in RDF format. Data is extracted from popular public event directories such as last.fm and UpComing, and access to data is provided through both REST API and SPARQL endpoint. The importance of such aggregated dataset is undoubtedly high.

The paper is well written with clear description of the dataset and a clear explanation of the approach taken for building such dataset. The importance of such dataset is further augmented by the real-time aggregation of posts shared on social network. Finding the overlap in meta-data across multiple directories through tag-based mapping is a neat implementation.

The RDF modeling includes sufficient details while the triple generation follows standard procedure. Authors have re-used existing ontologies (mediaOnt and SIOC) in addition to the main event ontology, LODE and have also used properties from FOAF, SIOC and Dublic core. Dataset has been further enriched through connection discovery on additional datasets such as Musicbrainz, DBpedia, Freebase and Uberblic. The site (http://eventmedia.eurecom.fr) was found to be properly working with sufficient details for most events.

I am pleased with the details provided in the paper. Few things to update:
1. I couldn't access the taxonomy description. http://data.linkedevents.org/category/ results in forbidden page.
2. Please provide footnotes for each site mentioned in Section 3.3 (ex: zevents, linkedin, eventbrite and ticketmaster)
3. Paper would be even stronger if it could reveal stats about the applications/sites using EventMedia dataset.

Solicited review by Christophe Gueret:

This paper presents EventMedia, a dataset about events created by collecting and integrating data from different sources.

I warmly recommend to publish it as is but would still have a few remarks to make:
* Page 2, it is said that the data is accessible via a REST API and a SPARQL end point. What about the de-referencing of the resources? Is it considered as part of the REST API?
* It is somehow unclear what is exactly stored in eventmedia and what stays on the data sources that are aggregated. For instance, Figure 3 shows part of the description for the entity from Flickr whereas Figure 2 doesn't... The knowledge base created contains a lot of data that is harvested from different providers, do you import everything that they provide or just point to their resources for those publishing Linked Data?
* IMHO, Community/Professional/Family and all subtypes of SocialGathering (c.f. page4, right column).
* What is exactly the relation between EventMedia and LinkedEvents? It sounds like LinkedEvents is the actual data set and that EventMedia is an application using it. If so, the title of the paper should be revised.
* In Section 5.1, what is the gold standard that was used to compute the precision and recall score?

---
Quality of the dataset: good
Usefulness (or potential usefulness) of the dataset: good (altough there is no indication of external application making use of the data being described)
Clarity and completeness of the descriptions: good

Hide Reviews: 
no

Making Web-Scale Semantic Reasoning More Service- Oriented: The Large Knowledge Collider

Paper Title: 
Making Web-Scale Semantic Reasoning More Service- Oriented: The Large Knowledge Collider
Authors: 
Alexey Cheptsov, Zhisheng Huang
Abstract: 
Reasoning is one of the essential application areas of the modern Semantic Web. Nowadays, the semantic reasoning algorithms are facing significant challenges when dealing with the emergence of the Internet-scale knowledge bases, comprising extremely large amounts of data. The traditional reasoning approaches have only been approved for small, closed, trustworthy, consistent, coherent and static data domains. As such, they are not well-suited to be applied in data-intensive applications aiming on the Internet scale. We introduce the Large Knowledge Collider as a platform solution that leverages the service-oriented approach to implement a new reasoning technique, capable of dealing with exploding volumes of the rapidly growing data universe, in order to be able to take advantages of the large-scale and on-demand elastic infrastructures such as high performance computing or cloud technology.
Submission type: 

0

Responsible editor: 
Decision/Status: 
Reject and Resubmit
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Review by anonymous reviewer

Authors present an overview of the LarkC project. The article is a submission to the SWJ special issue about Big Data, and thus raises expectations accordingly.

The paper is partially quite clearly written. However, the style is sometimes really confusing, e.g. due to extremely long sentences. An example of this the sentence running from the end of the first page to the start of the second page. These and the overall "marketing style" writing make the paper less readable. Authors also recycle some parts of graphics from their other papers, but I am not sure what extra value these bring in the new paper.

In terms of content I was not convinced either. As a research paper this paper does not contribute much beyond describing the overall system architecture. From this we can learn that LarkC enables plug-ins, applications and workflows, but how this is new and different from other platforms? Where is the novelty? How is the (possible) novelty evaluated? What advantage does the LarkC platform give to the listed applications (in section IV) that could not be achieved otherwise? How exactly does it help?

For the above mentioned questions and comments I cannot recommend publishing this paper in the special issue, at least not in its current form.

Minor comments:

- there were some repeated parts of sentences in the introduction from the abstract. Please consider rewriting them.
- [4] does not sound like an optimal reference for the Semantic Web.
- page 2: "…data collections as well as application…" --> "…data collections as well as applications…"
- [2] is not an optimal reference for Jena. Besides, it lacks the publication details (e.g. the year)

Review by Natasha Noy

The content overlaps significantly with two other papers describing the LarKC project:

1.) Large Knowledge Collider - a Service-Oriented Platform
for Large-Scale Semantic Reasoning:
http://www.cyc.com/technology/whitepapers_dir/Large_Knowledge_Collider.pdf

2.) SEMANTIC WEB REASONING ON THE INTERNET SCALE WITH LARGE
KNOWLEDGE COLLIDER: http://www.tmrfindia.org/ijcsa/v8i25.pdf

While this paper uses a different style to present the LarKC architecture, this alone does not seem to constitute a novel contribution. Readers who have read the other papers will not learn significant new information about the LarKC architecture from this presentation. According to the authors, the other difference is that they "focus to practical aspects of
constructing service-oriented Semantic Web services and introduce distinctive features of LarKC rather than discuss the implementation details of the platform", but I really didn't see much of that. There is very little discussion and it is not clear that the added value of this paper compared to the other publications warrants a publication in a journal.

Hide Reviews: 
no

Making Sense of Social Media Streams through Semantics: a Survey

Paper Title: 
Making Sense of Social Media Streams through Semantics: a Survey
Authors: 
Kalina Bontcheva, Dominic Rout
Abstract: 
Using semantic technologies for mining and intelligent information access to social media is a challenging, emerging research area. Traditional search methods are no longer able to address the more complex information seeking behaviour in media streams, which has evolved towards sense making, learning, investigation, and social search. Unlike carefully authored news text and longer web context, social media streams pose a number of new challenges, due to their large-scale, short, noisy, contextdependent, and dynamic nature. This paper defines five key research questions in this new application area, examined through a survey of state-of-the-art approaches to mining semantics from social media streams; user, network, and behaviour modelling; and intelligent, semanticbased information access. The survey includes key methods not just from the Semantic Web research field, but also from the related areas of natural language processing and user modelling. In conclusion, key outstanding challenges are discussed and new directions for research are proposed.
Submission type: 

1

Responsible editor: 
Decision/Status: 
Accept
Reviews: 

Resubmission after "accept with minor revisions", now accepted. First round reviews are below.

Review 1 by Harald Sack

The authors provide a survey on state-of-the-art data mining of social media streams based on semantic technologies, natural language processing and user modelling.
The survey in organized according to 5 key questions:
(1) Ontologies and Web of Data resources for representing and resoning the semantics of social media streams
(2) Semantic Annotation to capture implicit semantics of social media streams
(3) Extraction of reliable information from social media streams
(4) User modelling for social media streams
(5) Semantic-based information access to social media streams

After a concise introduction, section (2) provides a categorization of social media resources, followed by the identification of key social media sites, and the challenges of social media resources concerning reliable information extraction. Section (3) enumerates and introduces various ontologies used for representing social media resources as well as for modelling user behaviour.

For a survey I would prefer a more structured representation of the mentioned ontologies instead of a simple enumeration (at least in the same way as the representation of the key challenges in section 2)

Section (4) provides an overview of semantic annotation of social media resources as well as various mining approaches ranging from keyword extraction, ontology-based and WIkipedia-based entity recognition, over event detection, to sentiment detection, opinion mining, and cross-media linking. The section is finished with an in depth discussion that could also be better structured for readability. Overall the structure of the entire section is straight forward and built up one after the other in a logical order. The challenges and limits of applying these mining techniques on social media streams and microposts are worked out well. More care could have been taken for discribing and critically discussing crowdsourcing based methodologies.

Section (5) is focussed on semantic-based user modelling and how user interests can be derived from semantic annotations. User demographics can be analyzed via location data. Here also time dependency of user interests and their development over time is an important subject to investigate. The discussion of this section including the merging of heterogeneously created user modells as well as distinguishing personal from global interests could be more detailed.

Section (6) covers the information access to social media strreams, starting with semantic search, information filtering from various streams, followed by different means of visualization.

Section (7) summarizes the current challenges in social media mining such as cross-media aggregation, (web-scale) scalability, and the demand for standardized evaluation. This is one of the most important parts of the paper and could be worked out in more detail.

Overall this is a very valuable compilation of the current state-of-the-art in semantic-based social media stream analysis. In general for better understanding and readability the single topics could be supported by tables and diagrams to visualize the complex structure and relationships presented in the paper. Discussions could be summarized by tables showing the pros and cons of different tools or approaches. Bibliographical references for evaluation based on crowd sourcing by non-specialists are missing.

Minor issues:
p1. ...and social search [?] - missing reference
p5. ...you use the acronym 'SKOS' without further explanation or reference

Review 2 by Ashutosh Jadhav

Authors have presented a comparative review of how semantic web (and natural language processing) techniques are used to address some of the research challenges associated with social data. Specifically authors focused on challenges in social media and conducted a detailed survey of current work in semantic web and social media including use of ontologies, semantic annotations, user modeling and information access. In the context of addressing social media challenges, authors also covers recent work on natural language processing, visualization techniques.

Good points:
-------------
1. Although the scope of the topic of the paper is very vast, authors have managed to touch upon majority of the important research problems and related work. The paper provides a summary of various challenges, possible approaches, and their suitability, performance etc.

2. Authors have reviewed and cited most of the recent work in respective sections.

3. This is first comprehensive survey on use of semantic web coupled with natural language processing techniques to address variety of issues with social data.

4. Most of the sections are well written, providing summarization of recent work, comparative study and authors' views points towards the challenges and possible solutions.

Weak points:

1. The organization of the paper can be improved. For example authors have discussed too many topics under semantic annotation section. Can a taxonomy/classification or an organizational structure provided to make it easier for a reader to distinguish/differential different issues? Please have a look at this referance for organization http://www.slideshare.net/knoesis/citizen-sensor-data-mining-social-medi...

2. Most of the topics discussed by authors in this paper are important and provide comprehensive understanding of state of the art work in social media. At the same time authors covers some topics (for example sentiment analysis) that are heavily depended on natural language techniques and use very less or no semantics at all. Since the topic of this paper is use of semantic web in addressing social media challenges, authors could have limited their scope to topics, which use semantics to certain extent to solve the problems. Alternatively, authors can discuss how use of dictionaries (eg urban dictionary), machine learning or background knowledge could be applied for this topic (eg ICWSM2012 has a paper on topic specific sentiment, where identification of topics to which sentiment is associated utilizes light-weight semantics.

3. Some of the sections are not written well for example:

a) The section 2.1 Key Social Media Sites can be improved in terms of flow, summarization of various types of networks, their characteristics etc.
b) The section 4.1.1. Global topics can be improved in terms clarity

4. One of the major section in the paper is Semantic annotation. Authors have summarized what and how of semantic annotation but haven't discussed much about why semantic annotations are useful.

Minor point

1. Reference is missing on page number 1 ([?])

2. Paper has some grammatical mistakes like "These graph-based approaches to extracting keywords from Twitter..."

Review 3 by anonymous reviewer

The paper presents an excellent survey of the state-of-the-art approaches about various technologies for mining and intelligent analysis of the user-generated content in Social Media. To the knowledge of this reviewer, this work is unique of its kind and presents the most comprehensive up-to-date analysis of the literature in this emerging field of research. The paper is fun to read. It is well-written, easy to follow and summarizes the state-of-the-art very nicely. There are just a couple of comments regarding how to improve the paper.
Page 15: The authors point out the lack of lexical knowledge for processing user-generated content. Besides Wikipedia which has by now become an established resource in text analysis, Wiktionary has been found to be very valuable for these purposes [1,2]. Its particular advantage over standard lexical semantic resources is the inclusion of the terms specific to the user-generated content on the Web. Thus, it might be effectively utilized for analyzing social media content, and there are some ready-to-use tools available for that already: http://www.ukp.tu-darmstadt.de/software/jwktl/
Another recent trend in the community of language resources is linking and merging large scale resources to increase their coverage and make them more useful in broad-coverage text analysis. This is a link to the corresponding recent workshop: http://panacea-lr.eu/en/news/project/2011/12/19/lrec-2012-merging-lr-wor... A particular example of a linked lexical-semantic resource is UBY from the same group, as the Wiktionary resource mentioned above [3,4].
The authors mention crowdsourcing as a possible way to improve the performance of automatic systems. This topic has received a lot of attention of different communities in the recent years. Human computation, collective intelligence and games with a purpose can thus be discussed in greater detail. There are numerous references for that, which could make a very nice separate section in the context of this article, for example [5].
Page 26: Multilinguality is mentioned as one of the major challenges with most of the methods being developed for the English content only. This reviewer would like to see more discussion on what has been done for other languages and how the problem can be tackled. Which technologies are to be research intensively to address this issue?

References
[1] Christian M. Meyer and Iryna Gurevych. Wiktionary: a new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography. In: Sylviane Granger and Magali Paquot: Electronic Lexicography, pp. (to appear), Oxford: Oxford University Press, 2012.
[2] Christian M. Meyer and Iryna Gurevych. OntoWiktionary – Constructing an Ontology from the Collaborative Online Dictionary Wiktionary. In: M.T. Pazienza & A. Stellato (Eds.): Semi-Automatic Ontology Development: Processes and Resources, Hershey, PA: IGI Global, 2011, pp. 131-161.
[3] http://www.ukp.tu-darmstadt.de/data/lexical-resources/uby/
[4] Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer and Christian Wirth. Uby - A Large-Scale Unified Lexical-Semantic Resource Based on LMF, In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), pp. 580-590, April 2012.
[5] http://crowdresearch.org/chi2011-workshop/

--
Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Pages