XMLSchema2ShEx: Converting XML validation to RDF validation

Tracking #: 1824-3037

Herminio Garcia-Gonzalez
Jose Emilio Labra-Gayo

Responsible editor: 
Axel Polleres

Submission type: 
Full Paper
RDF validation is a field where the Semantic Web community is currently focusing attention. In other communities, like XML or databases, data validation and quality is considered a key part of their ecosystem. Besides, there is a recent trend to migrate data from different sources to semantic web formats. These transformations and mappings between different technologies have an economical and technological impact on society. In order to facilitate this transformation, we propose a set of mappings that can be used to convert from XML Schema to Shape Expressions (ShEx)—a validation language for RDF. We also present a prototype that implements a subset of the mappings proposed, and an example application to obtain a ShEx schema from an XML Schema. We consider that this work and the development of other format mappings could drive to an improvement of data interoperability due to the reduction of the technological gap.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Emir Muñoz submitted on 28/Mar/2018
Review Comment:

I would like to thank the authors for considering all comments (including mine) and revising their manuscript accordingly. Without any doubts, the latest version has largely improved in terms of content and presentation making the paper more readable and complete than the initial submission.

After two rounds of revision, I think the authors did a good job and the paper is suited for publication.

My final remarks are towards polishing the abstract, which currently does not flow as well as the rest of the article.

Finally, some last suggestions to polish:
- “This transformations are” -> “These transformations are”
- “(e.g., js, sparql, java, etc.).” -> uppercase and remove the “etc.” when listing examples. (Multiple occurrences)
- Ensure that text stays within the line width and doesn’t go out of the column borders
- “Xs:maxLength and xs:minLength are” -> lower case for first xs. (Multiple occurrences)
- “could not be converted back to their XML Schema origin.” -> “could not be converted back to their original XML Schema constructs.”

Review #2
By Simon Steyskal submitted on 01/May/2018
Review Comment:

The paper has been significantly revised and together with the authors' response accompanying the resubmission most of my raised remarks/questions were addressed. However, there are still a couple of open (minor) issues that I would like to see addressed:

I) Implementation
I.) The implementation found at https://github.com/herminiogg/XMLSchema2ShEx seems to be buggy (I already raised an issue), and even more importantly is missing some
additional documentation on how to run it.. I'm not expecting anything super fancy and extensive there, but anything that avoids having to reverse engineer the code would be highly appreciated.

0) General
0.) Put both Fig. 2 and 3 either before or behind the references, but not one each.
0.) Maybe consider using hyperref
0.) Please thoroughly proof-read the entire paper from start to end (or ask colleagues to do so)!

1) Introduction
1.) "In words of P.N. Fox et.al." -> s/et.al./et al./
1.) "RDF was missing a standard constraints validation language which cover the same features that XML Schema does for XML." -> "constraint validation language which covers"
1.) "For this purpose, Shape Expressions (ShEx) [32,33] was proposed to fulfill the requirement of a standard constraints validation language for RDF," -> standard as in ..? ShEx is not a standard; s/a standard constraints validation language/a constraint validation language/
1.) "Conversions between XML and RDF, [..] are necessary to alleviate the gap between semantic technologies [..] XML, JSON, CSV" -> what's the connection between XML/XSD to RDF/ShEx conversions and e.g. JSON? remove!
1.) "from in-use technologies to semantic technologies" -> implies that semantic technologies aren't "in use"; rephrase!
1.) "Although we consider that [..] as initial or by-default transformations." -> remove whole sentence
1.) s/solution and automatic/solution and where automatic/
1.) "Taking into account what we previously exposed" -> replace "exposed" with a more appropriate verb
1.) "the problem of Non-Deterministic schemata and what are the implications in this work" -> implications of what? rephrase! (e.g., "Section 6 discusses the implications of non-det. schemata on our work" or something along those lines)

2) Background
2.) "Migration to Semantic Web technologies is a task that has several previous works" -> rephrase
2.) conversions from XML schemata to other schemata and conversions from XML schemata to RDF schemata." -> what are "other schemata" compared to "RDF schemata"? "other" as in "non semantic web"?
2.) "Prior to schemata conversions, data migration has to be tackled." -> why? what does "data migration" mean in that context?
2.) "This transformation takes [..] and an ontology document [..] XML Schema is used to describe the mapping between it and the OWL document." -> what's "it" referring to? is OWL document == ontology document in that context? either use 2x OWL or 2x ontology.
2.) "In [1], the author makes an explanation of how XML" -> "In [1], the author explains how XML"
2.) "In [3] a transformation from RDF to other kind of formats [..] using embedded SPARQL into XSLT stylesheets which" -> "In [3], a transformation from RDF to other kinds of formats [..] using in XSLT stylesheets embedded SPARQL which"
2.) "However, these works (except [27]) are not covering the schemata mapping problem." -> so why not go with [27] then? how's your approach "different" from [27]?
2.) s/is desirable/it is desirable/
2.) s/certify/verify/
2.) "However, none of these works bring XML schemata to Semantic Web technologies." -> yeah.. that's why you listed them under "XML schemata to other formats"; besides that it would be also interesting to know, whether there are actually any overlaps between your approach and theirs.. i.e., the reasoning behind you talking about those exact 3 papers and not about any of the other ones out there.

3) Brief introduction to ShEx
3.) "In July 2017, version 2.0 was released with a draft community group report and the community group is currently developing the 2.1 version." -> "In July 2017, version 2.0 of ShEx was released ("together with a"/as) draft community group report and the community group is currently developing version 2.1."
3.) "Listing 1 illustrates an example of a ShEx shape. [Listing1] Listing 1 defines a shape with a :PurchaseOrder type." -> "Listing 1 illustrates an example of a ShEx shape defining a shape with a :PurchaseOrder type. [Listing1]"
3.) s/same similar/same/
3.) "must have an orderId of type that matches the regular expression Order\d{2} [..] The :Item shape must have a schema:name of value string" -> don't mix up value and (data)type! orderId must have a >value< that matches the regular expression and schema:name must have a >value of type< xs:string! (please look through the entire document for similar mistakes)
3.) you mention schema:orderQuantity in the text, but use :quantity in the listing -> fix!
3.) "The first of them passes validation and conforms to the shapes declaration" -> "The first one passes validation and conforms to the shapes declaration given in Listing 1"
3.) "whereas :order2 fails for several reasons" -> "fails validation"
3.) I would move the entire paragraph starting with "ShEx supports different ser. formats" and ending with "In this paper ShExC was used [..] and understand." in front of the one starting with "ShEx uses shapes to group [..]"

4) Mappings between XML Schema and ShEx
4.) "As presented in Listing 6, when an element has its complex type nested" -> 6 has no nested complex types, do you mean Listing 5?

A) super picky remarks
.) in Listing 11, "fixed" is not bold like all other XML terms
.) in Listing 14, fontbynumber and fontbystringname are lower case in XML but upper case in ShEx
.) in Figure 2, xsd: is used as prefix and not xs: like everywhere else

br, simon