Linked Web APIs Dataset: Web APIs meet Linked Data

Tracking #: 1350-2562

Authors: 
Milan Dojchinovski
Tomas Vitvar

Responsible editor: 
Rinke Hoekstra

Submission type: 
Dataset Description
Abstract: 
Web APIs enjoy significant increase in popularity and usage in the last decade. They have become the core technology for exposing functionalities and data. Nevertheless, due to the lack of semantic Web API descriptions their discovery, sharing, integration, and assessment of their quality and consumption is limited. In this paper, we present the Linked Web APIs dataset, an RDF dataset with semantic descriptions about Web APIs. It provides semantic descriptions for 11,339 Web APIs, 7,415 mashups and 7,717 developers profiles, which makes it the largest available dataset from the Web APIs domain. It captures the provenance, temporal, technical, functional, and non-functional aspects. We describe the Linked Web APIs Ontology, a minimal model which builds on top of several well-known ontologies. The dataset has been interlinked and published according to the Linked Data principles. We describe several possible usage scenarios for the dataset and show its potential.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Enrico Daga submitted on 13/Apr/2016
Suggestion:
Minor Revision
Review Comment:

The authors clearly made a lot of effort to improve this article in this second revision, and almost all the issues of the initial review seems to be solved.

However, there is still no mention of the update rate of the data. Considering it is mentioned that the data is acquired through HTML scraping (1 page every 4 seconds), how much does it take to rebuild the data? How frequently do you plan to do it? Is this process automated? These questions should be answered in the Maintenance section.

Minor issues:
Page 1:
"Least but not last” -> “Last but not least"
“ … can benefit from a sophisticated queries …” -> (something wrong here, maybe rephrase the whole sentence)
Page 4:
“Similar approach … debatable question” -> Maybe in a footnote?
Page 5:
In 6.2: “ … by by ..."
Page 6:
SADI -> never introduced before, add citation

Review #2
By Christoph Lange submitted on 17/Apr/2016
Suggestion:
Minor Revision
Review Comment:

This paper presents a dataset about Web APIs, which has been generated from the directory website ProgrammableWeb.com by screen-scraping, and furthermore interlinked with a few existing linked datasets. Like its previous revision, the paper …

* clearly motivates the need for such a dataset,
* explains the data source reasonably well,
* explains the ontology, which has been designed for this purpose, very well,
* explains the URI naming scheme and some statistics about the dataset,
* covers the interlinking, and
* presents as many as five (5) use cases, whose practical relevance is pointed out clearly.

The latest revision features the following main enhancements: it

* provides evidence for the usefulness of the data, by mentioning in-links from DBpedia (thus proving at least the beginning of third-party use) and a survey of a small group of users w.r.t. the subjective usefulness of the dataset.

* discusses the quality of the dataset (largely by following the 5-star open data scheme – although _even_ more could be done here, e.g. discussing more specific quality metrics such as those presented in http://www.semantic-web-journal.net/content/quality-assessment-linked-da...) and the stability of the dataset (briefly, by explaining how it is, and will be, maintained)

* discusses related work.

The latest revision thus meets, not perfectly but sufficiently, the three criteria for dataset papers. Also most of my more specific concerns were addressed. Moreover, the dataset is feature-complete and appears to be the result of solid work. I recommend acceptance with the following minor revisions:

* Still, the grammar is not perfect (e.g. w.r.t. use of articles); please let a native speaker review.

* section 7.1 "use cases": I wonder whether the queries that use prov:generatedAtTime make sense. If ProgrammableWeb does not record the history of versions of an API/mashup, then this probably effectively has the semantics of "last updated on ". Also, your ontology does not cover version histories. I would appreciate a discussion of these aspects.

* section 7.2 "survey": A broader user base would be helpful, plus some more information on their background, i.e. being more specific than "all of the participants […] have searched or used an API, while 19 […] also provide an API". E.g. in what _ways_ are they using APIs, and in what situations of their work do they consider your dataset helpful. (The distinction between the perspectives of consumer vs. provider is already a good step into this direction!) And finally, have these users used ProgrammableWeb, and if so, do they find ProgrammableWeb or your dataset more useful?

Review #3
By Tobias Kuhn submitted on 11/May/2016
Suggestion:
Minor Revision
Review Comment:

I found the new section about quality (Section 6) to be not super convincing,
but it is arguably hard to prove the quality of such a dataset before
wide-spread usage. Section 6 also contains many typos and small mistakes, for
example:

> Only two invalid triples representing tags have been spotted as invalid.

Drop first "invalid"

> used in linked by by other

"used and linked by other"

> when was created our first snapshot of the dataset

"when our first snapshot of the dataset was created"

Section 7, on the other hand, I found very nice and convincing. The results of
the survey look very promising for the given approach and dataset. I couldn't
access the data for the survey results thoughs (https://goo.gl/UeAbA7). I got
the message "You need permission".

So, I think the paper is now in a state that complies with the requirements for
such papers and can be accepted, after these points have been taken into
account:

- The paper should be proofread, preferably by a native speaker, in particular
Section 6 (see above).
- The data download link needs to be fixed. Moreover, uploading it to a data
repository such as Datahub would be preferable to a Google drive folder.


Comments