Review Comment:
This manuscript was submitted as 'Data Description' and is reviewed along the following dimensions: (1) Quality of the dataset. (2) Usefulness (or potential usefulness) of the dataset. (3) Clarity and completeness of the descriptions (see review criteria on the SWJ website).
(1) Quality of the dataset
(2) Usefulness of the dataset
I question whether a direct translation of XBRL to RDF is ideal, especially the XBRL notion of context.
The XBRL data suffers from using XML (linkbases) as underlying data model; RDF typically favours an entity-centric data model.
Thus, a more intuitive representation in the Semantic Web context is to follow the modelling constructs that established vocabularies use, such as Data Cube (QB) or SKOS.
However, the presented dataset does not use established vocabularies (such as the QB vocabulary which is applicable to XBRL and widely used for modelling numerical datasets).
Why the authors did not follow the QB model for representing numerical data is not explained.
Further, the authors do not provide their dataset as Linked Data, i.e., an HTTP GET on Semantic XBRL URIs do not return the corresponding RDF.
In an updated version, the URI schemes/templates for constructing entity URIs should be included (e.g., the URI template for identifying entities identified with a CIK).
XBRL's calculationArcs that describe how various measures related to each other are not mentioned in the paper.
The SIC taxonomy for classification of EDGAR entities is not mentioned.
The paper does not include licensing information.
Also, there is no reported usage of the data.
(3) Quality and completeness of description
My colleagues and I have done work in the XBRL space which is missing from the description (see, for example, the Linked EDGAR dataset, http://datahub.io/dataset/linked-edgar, which even provides links to Freebase and DBpedia, and Kaempgen et al., "Accepting the XBRL Challenge with Linked Data for Financial Data Integration", http://2014.eswc-conferences.org/sites/default/files/papers/paper_224.pdf).
In sum, the modeling decisions are not sufficiently explained, the dataset is not available as Linked Data, and the contribution over existing work is unclear.
|