ActiveRaUL: Automatically Generated Web Interfaces for Creating RDF Data

Tracking #: 549-1752

Authors: 
Anila Sahar Butt
Armin Haller
Shepherd Liu
Lexing Xie

Responsible editor: 
Guest editors Semantic Web Interfaces

Submission type: 
Full Paper
Abstract: 
The amount of automatically generated machine-readable data on the Web has significantly increased in recent years. This is in part due to the advent of Linked Data and its publishing tools that allowed the mapping of relational data to RDF. However, the amount of semantic Web data is still many orders of magnitude smaller than the World-Wide-Web, and this limits semantic Web applications. One of the barriers for semantic Web novices to create machine-readable data is the lack of easy-to-use Web publishing tools that separate the schema modelling from the data creation. In this article we present ActiveRaUL, a Web form-based user interface that particularly supports users inexperienced in semantic Web technologies in creating RDF data. These Web form-based user interfaces in ActiveRaUL can be automatically generated from any arbitrary input ontology through a process described in this article. We map the graph-structured input ontology to a tree-structured Web form while still allowing the user to create RDF data typed according to the input ontology. We validate our approach of automatically generating Web interfaces from an ontology in a user study based on use cases developed by the W3C Semantic Sensor Network (SSN) Incubator group. We test the effectiveness, efficiency and the satisfaction of users in creating RDF data based on the SSN ontology with ActiveRaUL generated user interfaces compared to a state-of-the-art ontology editing tool.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Alvaro Graves submitted on 18/Oct/2013
Suggestion:
Major Revision
Review Comment:

This papers describes a system (ActiveRaUL) that automatically generates web forms based on an ontology. The goal is to automatize the form creation process and provide a HTML form for non-semantic web experts to create instance based on such ontology.

In general the paper is well-written and presents a formal and interesting approach to an already existing idea (creating automated web forms based on an ontology). I would mark this paper as "Accept with minor revisions" if it weren't for the evaluation section that needs more work. First, although the values presented in the evaluation section favor ActiveRaUL over WebProtege, it is not clear to me if this advantage is statistically significant or just part of the natural noise of user evaluation. This is even more critical due to the fact that this evaluation was based only in 12 subjects. I would strongly recommend doing some tests (Wilcoxon comes to mind) and present the results as boxplots, as well as tables.

Besides that, there are minor issues that I'm detailing below.

* In the abstract, you said:
"However, the amount of semantic Web data is still many orders of magnitude smaller than the World-Wide-Web, and this limits semantic Web applications."

Please explain this point or remove this sentence, since it is not clear what do you mean with it.

* On page 2 you start talking about RDF(s). I assumed this means RDF _and_ RDF schema. Please clarify that

* On page 2, the paragraph that starts with "In contrast to the relational model where tuples are bound to a relation (table)" is not clear, you need to explain a little bit more.

* On page 3 you start talking about "concept graph", which is not defined until Section 5.1 (page 8). You should give at least some informal definition (and pointer to the complete definition) before talking about a CG.

* I'd like to see a (brief) discussion about mixed ontologies. Would that change anything in your approach?

* I think section "Mapping single range existential restrictions to se- mantic associations" (pages 10 and 11) should be rewritten. It is a bit confusing and I needed to read it several times.

* In Figure 12 (page 15), the nodes "v" and "Person" were confusing until I saw the following figures. A short explanation would help. Same applies for the "subgraph under "RaULsn" node.

* On page 17 next to the paragraph starting "For multi-length property paths where the length of the property path is greater than 2" I'd like to see a (again, brief) discussion what happens when the length of the property path is >> 2. First, does that occur in known ontologies? If so, how would ActiveRaUL deal with that? What final HTML representation would occur? If that phenomenon occurs, how would affect the usability/UX of ActiveRaUL (e.g., opening lots of popup windows?).

* Figure 21 appears after Figure 22. That's easily fixable.

* In Section 8.2 (page 23) you said: "In particular, we used the university deployment exam- ple from the wiki, because: (1) it includes program- ming examples in RDF/XML; " Please explain why this is relevant. From what I understood earlier, ActiveRaUL supports multiple representations. If it is not relevant, please remove.

* In Section 8.5 (page 25) there is a sentence that repeats "both" three times. "In both tools the experienced user group performed better, although the difference was less than 10% for both tools for both systems." I suggest you rewrite that sentence.

* There are too many number in Tables 4, 5 and 6 (page 26). Please provide some visual representations (charts) of the data. It will make it more compelling to the reader. I strongly suggest the use of boxplots when applicable.

* In the conclusions (page 27) you mentioned: "Further, our current implementation only deals with rdfs:labels, the other annotation properties (i.e. owl:versionInfo, rdfs:comment, rdfs:seeAlso, and rdfs:- isDefinedBy)" What happens with inherited properties? For example, what if my ontology has skos:prefLabel and skos:altLabel (both subproperties of rdfs:label)? What is I do not have rdfs:label but only skos properties?

* Finally, I STRONGLY suggest you should add a sentence indicating the availability of this system as well as its source (it is Open Source, right?). Something like "ActiveRaUL can be downloaded from http://foo.bar"

Review #2
By Antoine Zimmermann submitted on 17/Jan/2014
Suggestion:
Reject
Review Comment:

Dear authors and editors, please accept my apologies for sending the review so long after the submission. Although my review does not evaluate the submission positively, I hope it provides substantial arguments and constructive criticism to improve it for a future resubmission.

Summary:
=======
The paper presents a method and system to automatically generate Web forms for creating RDF data conforming to an input ontology. The system extracts classes and properties from a given OWL ontology to create input fields and arrange them according to the relations that exist between the ontology terms.

Overall evaluation:
==================
My overall evaluation is that this work is not mature enough and has significant flaws that are very important to address before resubmitting it in proper shape. Thus, I suggest a reject and resubmit.

General comment:
===============
On the good side, I'd say that the general approach depicted in figure 6 is interesting: (1) from the input ontology, generate a graph that abstracts away from the actual meaning of the ontology by focusing on the relationships that are relevant to user interfaces; (2) use this graph to generate a description of the user interface that is agnostic to the actual view that will be generated; (3) transform the description in a concrete Web interface.
Also a good point, the approach was implemented and tested and shows that the system (if not the approach) can at least simplify some tasks.

However, there are issues of major concerns. The core method is presented via examples and example patterns, not in generality, and it is very difficult to understand what the method would output if used on an arbitrary ontology. There are example patterns that seem to subsume other example patterns. There are definitions at the beginning that are unclear, some of which are used all over the paper (especially, semantic association). All in all, it would be hardly possible to reimplement a system that does the same thing as the system tested in Section 8.

The evaluation has weaknesses, but I understand the difficulty of evaluating such a system. However, I regret that there is no criticism of the evaluation results. The analysis only compares numbers and quickly comes to the conclusion that the proposed system is better. A honest account of the limitations of the evaluation would not have been a negative point for the paper, on the contrary.

Detailed comments:
=================
Introduction:
"The tools of choice ... such as Protégé" -> instances are more often generated programmatically using an RDF tool chain. How can Protégé generates the billion of instances that is currently available on the open Web?

Section 2:
"In Callimachus ... . While it facilitates ... and the resulting Web applications do not fully comply with OWL semantics." -> What does it mean that an application does not comply with OWL semantics? We are not talking about reasoners here?
"As a consequence instances generated through these forms may be logically inconsistent to the ontology they are modelled after." -> maybe, but I don't see how ActiveRauL solves this problem.

Section 3:
Fig.1 does not explain what is the semantics of the arrow types (dashed or solid?). The ActiveRaUL approach is supposed to be used on OWL ontologies. How is this graph representing an OWL ontology?

Section 4.1:
Item II. "It has to be noted that we use reification only as a data binding mechanism which is particularly needed for maintaining the semantics in the (X)HTML+RDFa rendering" -> What does "maintaining the semantics" mean here?
On page 7: "The resulting RDF/XML is send to" -> sent to

Section 5:
This section needs extensive rewriting.

Sec.5.1:
"Let R=I union B union L (IRIs, Blank nodes, and Literals) be the set of RDF resources." -> In RDF, resources do not only include IRIs, blank nodes and literals. They include documents, people, ideas, properties, classes, numbers, feelings, everything. IRIs, blank nodes and literals are collectively called "RDF terms" in standard RDF.

Def.1 is not defining an RDF graph. The first sentence is not really a sentence (the subject needs a verb and perhaps a complement). Are we supposed to understand that the directed graph G that is presented at the beginning is in fact an RDF graph? In any case, in standard RDF, an RDF Graph is not a directed graph. An RDF graph is a set of triples (s, p, o) where s is un I union B, p is in I, and o is in I union B union L. Even if such sets of triples can be represented graphically, the graph-form of an RDF graph is not a mere directed graph. It should be a directed labelled multigraph where:
- nodes are IRIs, blank nodes or literals,
- arcs are labelled with IRIs,
- 2 distinct arcs from the same source node to the same target node cannot have the same label, and
- if a node is the source of an arc, it cannot be a literal.
"E included in R" -> if (U,E) is a directed graph, then E is included in U x U, by definition of a directed graph.
"An edge p in G is a 3-tuple (ui,p,uj) where p in E and ui, uj in U." -> How can an edge, being an element of R, be at the same time a triple that contains itself in its second element?

Def.2: this is the most critical definition because it is used all over the place. I could not figure out what a semantic association exactly is. The "definition" starts by defining what a "directed path" is. Are the triples in a directed path meant to be RDF triples? Then it says what the length of a semantic association. But what is a semantic association? Can the sequence of a directed path be empty?
What is "the maximum number of consecutive connected edges involved in the sequence"? Which sequence are you talking about here? How are the variables nu, p_x, u_y, i, j, k, l, m quantified?
At some point, I thought that a semantic association is a pair of nodes in a directed graph such that there is a path from one node to the other. So the length of a semantic association would be the maximum length of all the paths from the source to the target. But this is not well defined because the longest path can be infinite, due to cycles. So maybe it should be the shortest path? But even with these definitions, the following definitions and sections confuse even more the notion of semantic association.

After Def.2: "Semantic associations exhibit multiple property paths" -> apart from "semantic associations" that is not defined, there is now a new notion of "property paths". How is it supposed to be understood in this context? Is it the same as the SPARQL notion of property path? Fig.5 does not help me figuring out what it is.

Single-length property path: this seems to confort my exposed view of "semantic association", but later, it will contradicted, as I will show.
What are the equalities after this item? Are they constraints of the definition? Properties that follow from the definition? How can ui be equal to I. Isn't I supposed to be the set of all IRIs? I imagine it should be "ui in I".

Similarly, in "datatype property paths", "ui = L" is certainly an error, meaning "ui in L"
The "definition" of datatype property paths should say "A datatype property path is ...". If a datatype property path is a single length property path, then it must have the characteristics of a single length property path, among which there is "ui in I". But then, it is said that "ui = L".

Multi-length property path: "nu is linked to anouther concept ui through more than one property" -> this is ambiguous, as it could be read as "there are two triples (nu,p1,ui) and (nu,p2,ui)", while in fact it should read "there are two triples (nu,p1,x) and (x,p2,ui) with x a node". It should say "through a sequence of more than one property".
Again ui cannot be equal to the set of IRIs and literals.
Note that, for any multi-length property path, there is either a single-length property path or a datatype property path.

Branched property path: again, uj in I union L ...

Multi-range property path: assuming that the graphs in consideration are indeed RDF graphs, then I'm not sure what source(p) and target(p) means. Consider the triples: (s,p,o1), (s,p,o2), (x,p,z), (y,p,z). I would guess that source(p) = {s,x,y} and target(p) = {o1,o2,z}. In this case, there seems to be a pattern similar to the one in Fig.5e, but this patterns does not fulfil the constraint of multi-range property paths.
The notation u1 \neq u2 \neq u3 formally does not prevent u1 from being equal to u3.

Cyclic property path: again, uj in I union B

Def.3 seems to imply that semantic associations are a kind of graphs having nodes. It seems that a semantic association between nu and ui wrt a graph G is the subgraph semassoc(nu,ui,G) of G that consists of all the paths from nu to ui. In this case, a semantic association can be empty. The problem is that the same notation is used when talking about a semantic association (whatever it is) and paths. The notion of "length" is sometimes used to talk about the length of a path, sometimes the length of a semantic association.
The notion U(pi) is not defined.

"we can infer" -> what does inferrence has to do with this?

Def.4: I assume that "there exists a semantic association" means that there exists a path from nu to ui. The semantic associations are constrained to be of length l. But what is l?

Section 6:
In Alg.1, line 5, there is an index i' used, but it's nowhere defined.

"Since a property can have multiple domains and/or multiple ranges, it is difficult to determine for a specific domain and property which range concepts are relevant." -> what is this supposed to mean? Relevant in terms of what?

"our current implementation creates a semantic association for ui through property pi for each range concept" -> here, it seems that "semantic association" takes a different sense from how it was used befor.

In Fig.8, "single range existential restrictions" refers to restrictions that can be existential or universal (allValuesFrom). The name is misleading.

"a property pi+1 can have values only from one concept ui+1" -> why is it so?

"pi+1 has a single target concept" -> what is a target concept?

The standard datatype URI for character strings is xsd:string, not xsd:String.

In Fig.9, there is no reason why bnode b2 be typed with rdfs:Container. Besides, the property between b1 and b2 is possibly owl:cardinality, in which case it does not make sense that b2 be the subject of owl:interesectionOf.

"is a blank node of type owl:Collection" -> there is no such type defined in the owl: namespace (neither there is an rdf:Collection or rdfs:Collection type in the rdf: and rdfs: namespace).

rdf:list should be rdf:List

In Fig.10, b2 is an rdfs:Container with multiple rdf:_i properties. This does not correspond to the standard construction of owl:intersection that must use rdf:List, rdf:first, rdf:rest.

Section 7.1:
"has a defined mapping in ActiveRaUL" -> to what mapping does this refer to?

"by removing the edge that introduces the cycle" -> there are necessarily multiple edges that compose a cycle. Which one is removed? The next sentence suggests that all edges in the cycle are removed.

"Since the relationship between nodes connected by reverse edges can be inferred" -> it's not clear what "inferred" means here. It is not inference in the sense of RDF and OWL formal semantics.

"However, cycles caused by owl:TransitiveProperties are not inferable ..." -> inferable? and what does it mean that owl:TransitiveProperty causes cycles? Here, we are talking about cycles in the structure of an RDF graph, not cycles in the relations that exist in the interpretation domain.

"SA-5 involves a cyclic property path because of the relationship (Organization, locatedIn, City)" -> it should rather be (Organization, hasEmployee, Person).

Section 7.2:
The whole section is very unclear. It shows how certain patterns are mapped to a RaUL graph, but it does not properly explain how any graph is mapped to RaUL. Moreover, there are patterns that subsumes other, for instance, in Fig.14 and Fig.15.

How are the cases in Fig.18 and Fig.19 distinguished. What is the semantics of nodes in the shape of diamonds in the figure, compared to ovals? It seemed to me that nodes where normally URIs, bnodes or litterals. Literals are in rectangles.

The procedures RaULxxx are not explained in terms of their input and output. Explaining how it works mostly with examples is not enough. The textual explanations are not clear enough either.

In Fig.22, "publicher" -> publisher

Section 8:
The evaluation has several issues. First, the number of participants, twelve, is not sufficient to bring statistically significant conclusions. Second, it is not known what exact tasks these people had to do. Third, it is not clear how the tests are validating the approach because they are merely comparing the efficiency of interfaces. Maybe, bring a better interface to WebProtégé, keeping the same underlying approach, and you get a tool that beats ActiveRaUL.

I personnally tested the tool available at http://www.activeraul.org/demo/arbitraryOntology.html (now unavailable) and found it was inappropriate for most data editing tasks. I do not know if this online tool is exactly the implementation used in the test cases, but having no indication of the contrary, I can only suspect that the limitations I've seen are present in the tool. For example, I tried to generate forms for various existing ontologies, and most of them did not generate anything at all. When they did generate a form, it was so simple that it couldn't do anything practically useful. Even with the SSN ontology, with which a nicer web form was generated, the possibilities were quite limited because the RaUL approach is very constraining.

In Table 3, the owl: namespace foes not define owl:NonFunctionalProperty. owl:InverseOf should be owl:inverseOf.

Section 8.5:
"Accuracy/Correctness" should be the title of Section 8.5.1 and following subsections should be renumbered accordingly.