The Semantically Mapping Science (SMS) Platform: Towards an Open Linked Data Infrastructure for Social Science Research

Tracking #: 2132-3345

Ali Khalili
Al Idrissou
Klaas Andries de Graaf
Peter van den Besselaar
Frank van Harmelen

Responsible editor: 
Guest Editors Semantic E-Science 2018

Submission type: 
Full Paper
Social phenomena are generally complex. Understanding them, and designing public policies that may affect them, requires integrating and analyzing data from multiple sources. Currently, social research is mostly either rich but small scale (qualitative case studies) or large scale and under-complex (because it generally uses a single dataset - often a survey or administrative data). Progress in the social sciences depends on the ability to do large-scale studies with many variables specified by relevant theories: There is a need for studies which are at the same time big and rich, and this requires high quality linked and enriched data, that can be accessed through user-friendly interfaces. The Semantically Mapping Science (SMS) platform, presented in this paper, is a user-centric platform for data enrichment, integration, exploration and analysis with focus on open access to research data and services to tackle this challenge. We show the added value of the SMS platform through a number of illustrative use-cases. The SMS platform focuses on the data needs of researchers, policy makers, and managers in the area of science, technology, and innovation policies, but it generalises to data in other social science domains.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Amrapali Zaveri submitted on 06/Apr/2019
Major Revision
Review Comment:

The paper describes the implementation of the Semantically Mapping Science (SMS) platform, a user-centric platform for data enrichment, integration, exploration and analysis to enable open access to research data and services.

(1) originality
The platform is novel and builds upon lessons learned from other existing platforms. I do have a few questions and comments:
- I am not able to access the platform, nor and the spreadsheet in foodnote 21
- Link discovery algorithms traditionally rely on the use of the owl:sameAs property - where is the evidence for that?
- As a result, a "context specific alignment" is now represented using a decorated alignment graph followed by annotated links. What does that mean? Can you provide an example?
- "ground truth dataset ... or crowdsourcing (which is not always feasible, especially if expert knowledge is required)" - can you provide evidence about crowdsourcing not being feasible?
- The current Semantic Web methods for entity linking are too crude, since they rely on the owl:sameAs semantics - please provide statistics to back up this claim?
- What do you mean by Data Quality and Data Curation? Can you provide examples?
- "we have developed efficient methods for automatic link quality estimation" - what is the metric that is used for measuring the efficiency?

(2) significance of the results
The use cases described show the significance of the platform in enabling open access research in different applications and in different domains. It would be interesting to know the challenges that were faced by the users while using the platform and the consequent limitations of the platform that were encountered. Since I cannot access the platform, I cannot test it out but I would like to know whether it is possible to test out my own use cases.

(3) quality of writing
The paper is well written in general. However, I have the following comments:
- the text in the Figures e.g. 5,6, 7, 9, 15, 16 are a bit too small to read
- For RDF, please add full form and reference
- "linked data" should be "Linked Data"
- Add the full form for CEDAR at its first occurrence
- "Websites" should be "websites"
- Add link to FAIR principles paper:
- For smartAPI add the reference to
- In Figure 11 glrc API is show but not mentioned in the text

Review #2
Anonymous submitted on 18/Apr/2019
Major Revision
Review Comment:

The paper describes a tool, called Semantically Mapping Science (SMS) platform, which allows data enrichment, integration, exploration and analysis.
It is indeed an interesting topic, in line with the journal and the special issue.
In particular, it is very useful for making the use of semantic data easier and foster the integration among datasets. Thus, I think the tool is valuable and useful for researchers in different communities.
The paper is well-written and easy to follow, and report the state of the art of the technologies in the field (even if the authors tend to over-cite themselves).

However, I have some concerns about the paper. It seems the collection of different works and of about ten previous papers of the authors. Indeed, I understand the need to put all together to provide a systematic description of the system, but the differences among these publications should make explicit, indicating what section of the paper is novel and unpublished. In general, which are the novel parts of this paper? This should be make clear both in relation to state of the art as well as previsous papers of the authors.

Moreover, in general I don't see why this platform is specific for social science research: please, make clear why this field can benefit more of this platform with respect to other domains.

I don't see any consideration about privacy of data: how it is managed by the system?

What about web data (like pages, or posts) that are not semantic? I don't understand if they can be considered as a source of the platform.

It is not clear which analysis can be perform by the platform and how.

In general, has the platform been evaluated? Is it simple to use? does it perform well? Please, report at least the partial tests or user evaluations the authors performed for the single modules, if any.

Precise comments:
- Section 3.2 explain better the meaning of metadata requested
- Section 4: a description of the IUI is missing. Has it been evaluated by usability and efficacy point of view by the users?
- Section 8: use case: use case 1 has just been published elsewhere (ref 4): I suggest to avoid it and described in more details the other ones, in particular use case 2

Minor remarks:
- check all the links, for example, link 25 does not work
- figure 5 and 6 should be exchanged

Review #3
By Natalia Villanueva-Rosales submitted on 24/Apr/2019
Major Revision
Review Comment:

The Semantically Mapping Science(SMS) presented in this paper addresses the need of integrating heterogeneous data (i.e., structured, unstructured, qualitative and quantitative) to enable data analysis for Social Science research. This work aligns very well with the SWJ Special Issue on Semantic eScience. SMS is an open source platform that enables the curation, enrichment, linking, querying and browsing of data. Each of the steps in the SMS platform is summarized in different sections of the paper and lessons learned are provided. Unfortunately, at the time of the review I was not able to access the SMS platform at - I also tried The paper does a good job in motivating the use of SMS for Social Science Research with four different use cases. Including additional information to clearly identify the contribution of this paper is recommended.

The technical architecture, data curation, semantic enrichment, data linking, querying and browsing of data features of the SMS framework has been previously published at other venues Refs. [2-11]. The authors describe that this paper brings together individual contributions and provides an overall description of the platform. Content from previous published papers is included in Sections 2-8. This content includes figures (e.g., Fig. 1, 2, 4, 5) that may be subject to copyright from previous manuscripts. Although some of the previous publications are in open-access proceedings, it is recommended that authors verify the guidelines to reuse such material in a new publication.
It seems that the main contribution of this paper is to document the impact of SMS in Social Science research by describing specific with use cases presented on Section 8. If this is the case, the paper would benefit from including how these use cases were created, describing the specific role of social scientists in this task. Documenting specific discoveries or challenges addressed by Social Science researchers would make a stronger case about the contribution that SMS is doing to this domain. For example, did the data analysis enabled by SMS inform specific public policies? Are there any publications generated from these studies in the Social Science domain? Two references [11,23] are listed for publications about use case 4. However, both publications are at Computer Science venues. Context to motivate the use cases is presented but evidence about the impact of SMS is limited.
Similarly, authors should consider including how these use cases were used to evaluate the tool, i.e., which features were refined or improved? What are the lessons learned from these use cases that can be transferred to other applications or domain?
The link with the use cases was not available at the time of the review.

The SMS addresses an important challenge for harnessing data for domain-specific research. It would be useful for potential SMS users to describe the efforts involved in using the platform. What are skills required for potential users? What is the support available for new users interested in integrating new datasets? Was the interface evaluated from the usability perspective? Technical descriptions of the components developed for and leveraged by SMS is provided in the paper but the information about user experience is limited - although a large number of 470 registered users is commendable. This is particularly important due to the target audience of SMS and assuming this is part of the main contribution of this paper.
Authors should include a specific comparison between SMS and the related approaches listed in the related work to highlight the novelty of the SMS framework.
The authors claim that the SMS platform is being used by researchers from geography, public health and educational research. More information about these efforts should be provided to make a stronger case that the platform is domain agnostic.

Quality of Writing
The paper is well-written and easy to read.
The text in Figures 15-18 is very hard to read, consider increasing the font in the figures.