Injecting semantic annotations into (geospatial) Web service descriptions
Review of the final round of revision by Jacek Kopecky. Reviews of earlier rounds are further below.
The paper says it talks about the sapience API but it actually doesn't (as far as I could see); that promise should be removed.
The paper could use some cleaning, after the rounds of review and incremental changes, many paragraphs and sections now jump between unrelated points, and have unwanted forward references, so it's hard to follow the points made by the paper.
Some terminology could still be fixed: perhaps say "path expressions" instead of XPath expressions (the first mention would note the similarity to xpath); and perhaps clarify earlier that you expect the client (in your terminology, apparently meaning a system that uses some services with its own version of their semantic annotations) to provide the semantic annotations of the services it uses in its application(s), and the proxy just facades the service and maintains the annotations over service description changes.
This is a revised manuscript following an accept with minor revisions. The reviews immediately below are for the previous round. Reviews of earlier rounds are further below.
Review 1 by Tudor Groza:
Accept as is.
Review 2 by Jacek Kopecky:
I'm still not happy with the how the paper presents its contributions. Unless I'm missing something, the authors claim (in the paper and review rebuttal comments) two points of value:
1) the proxy maintains the metadata through (some) changes of the underlying service descriptions,
2) the proxy redirects to the service so the client only needs to know one location (the proxy).
On the first point, the paper still does not contain a useful discussion of the types of changes that the system will handle. The only sentence that could remotely be seen as useful is "Changing the metadata does not affect the injection procedure, as long as no elements are removed which are used within the extracted XPath expressions." There should be examples of what changes will work and what not, with focus on defining the line between the two. Such guidance in the paper will help service providers make their service description changes friendlier to the proxy, and will help clients manage their expectations. Then potential users of the proxy will be able to evaluate whether the proxy would have benefit with their pattern of usage and description changes.
The second point is not very well demonstrated in the paper. On the one hand, the term "proxy" would indicate that the system will facade the service, but the authors say that it "redirects", meaning the client must interpret HTTP redirects and send the request to the service itself. In this behavior, the system is less of a proxy and more of a registry because it really only returns requests for service descriptions. On the other hand, even if the above was not an issue, the paper doesn't describe any limitations to the redirection of what would be invocation requests to the service. A few problems come to mind immediately: i) HTTP only allows automatic redirection for GET and HEAD requests (for other requests, the user should be involved); so the "proxying" behavior would only work for the method GET. ii) if the invocation request uses GET and consists of a single "sid" parameter, the proxy would interepret it as a request for metadata, not an invocation request, and not do any redirection; so the proxied service cannot use a parameter called "sid". iii) why would the client even send invocation requests to the proxy? Assuming the client gets the service description from the proxy, the endpoint information in the service description will be unchanged and still point to the original service's endpoint, right?
The authors put a lot of associated functionality out of scope of the paper, especially the discovery of the available annotations for known services. Without such functionality, however, the system looks incomplete because the only remaining difference to a static document repository with annotated service description copies would be the point 1) above, the adaptation of annotations to changed descriptions, which is not described in the paper to any extent that could let me consider this as a significant contribution worthy of a journal paper publication.
I'm not saying that you don't have journal paper material in your work, just that the manuscript doesn't show it.
Btw, sorry, at the first resubmission I didn't know about the point-by-point rebuttal comments on the journal server (different from the review-submission server), hence the request in my second review.
Editorial comments:
reference 30 is mangled (surely, the title is not "49"); reference 31 needs to be capitalized properly.
The manuscript is a revised submission. Below the reviews for the revision, followed by the reviews for the original submission.
Review 1 by Marinos Kavouras:
Although the paper has improved a lot, the semantic aspect per se is still not clearly presented. Nevertheless, since the character of the paper is fully technical, being a report on a system, it can be considered acceptable taking also into account the general readership of the journal.
Review 2 by Tudor Groza:
The paper has been indeed improved by the authors according to the received reviews. Most of the critical comments have beed addressed. A couple of small remaining remarks:
* the part about the manual annotation of Web services is still not clear enough, i.e., who does it and when (or why). The reason I keep insisting on it is because, generally, the process of manual annotation always brings along the issue of incentive.
* the small experimental evaluation currently contained in the paper brings a certain plus, however it could profit from a more detailed description, i.e., what to the axes represent, what were the factors considered during the experiments, etc.
Review 3 by Jacek Kopecky:
The paper presents a proxy-based storage and retrieval system for semantic annotations in Web service descriptions. It seems to be a workable system, and the current revision of the article is much improved - especially it presents more useful concrete information than the previous I've reviewed - but the article overall still isn't very convincing.
The system description says that "the client application has to specify which annotations are to be injected"; and "depending on the client request, different annotations may be added to the metadata", which seems to promise something like the following features: 1) a client can specify multiple annotations to be combined, 2) a client can specify the ontologies that it is compatible with, and the proxy will inject the appropriate annotations, 3) the client may search for available annotations based on the original service metadata URI. These would be interesting, but the system does not seem to support any of them. As it is, the system is very simple and the only actual interesting contribution is that it can maintain the semantic annotations even through some (weakly specified, but at least discussed a bit in this revision) changes to the underlying metadata. That isn't much.
And quick testing based on Figure 2 and the returned data indicates that the system doesn't work for WSDL services. (see below) The authors should at least test the uses they want to publish.
Finally, the next update should be accompanied with a brief but point-by-point statement by the authors about how they handled the review comments.
Detailed comments follow:
The introduction says how "clients are not aware of [the proxy's] existence: the proxy takes the identity of the proxied service", which could be a nice feature except it's not explained anywhere; instead the system clearly changes the URIs of the service descriptions (or "the web service location" - whatever exactly that means - as said in Section 3).
In Section 4.1, change "Url (7) in Figure 2" to say Url (5).
Clarify the sentence "The nature of the reference (attribute, sibling, or child) is concatenated, and used to identify the location in the original metadata document." - the resulting XPath expressions in Figure 3 are somewhat ambiguous; in effect, the system changes here the meaning of XPath, so it should say how. For example, the normal interpretation of XPath (2) is "all attributes of an xsd:simpleType element that has a sawsdl:modelReference attribute". And how does the system interpret the "~25437290" part at the end of (4), which doesn't fit the syntax of XPath? (The article says a bit about this adding uniqueness to the location, but more should be said about its handling).
Why is there a redirect to the service if request parameters do not match? What kind of scenario is this redirect designed to support? Is this intended for proxying actual interaction with Web services? If so, what are the limitations here? (I can think of two immediately - it only handles invocation through GET, and it only allows parameters other than "sid".)
In Figure 2, URI (2) doesn't work; in fact, none of the WSDL-based descriptions seem to work due to a bad (OGC-based?) XPath in the annotations. This goes directly against the article's statement that the system has been thoroughly tested.
In Figure 3, XPath (4) has a mismatched ' vs. ' and ends with an extra " - is it a real example with editing errors, or is it a made-up example?
Figure 5 needs a legend - for the vertical axis, the text says "average response times" - is that in seconds? for the horizontal axis, short names or IDs for the services would be useful (even service 1 ... service 4).
The measurements seem to indicate a speed-up for retrieving the description of a SAWSDL-Testsuite service through SAPR in contrast to direct retrieval; this is explained as "probably due to a more efficient implementation" - implementation of what? How can a proxy that has to access the original source of data perform quicker than a direct access to the original source of data? The only reason I could think of is a better network routing when it goes through the proxy server than when it goes "directly".
As before, the paper says the proxy is available as "free software" so it should include a link to download.
On Page 9, the use of the acronym SDI (end of first column) precedes its expansion (beginning of second column).
In Conclusions, "the client does have to manage the two separate sources" - I suspect you wanted to say "does not".
In references 11, 30 and 31, expand on the venue of the publications - 11 doesn't have anything, 30 and 31 only have an address.
Below the reviews for the original submission.
Review 1 by Marinos Kavouras
The paper presents an approach, implemented as a web service, for semantically annotating descriptions of geodata. I understand that the focus is at the explication level, attempting to account for this semantic enrichment while maintaining the standards followed. Separating such annotations from original metadata is a right way to go. I was not able to test the implemented service described in section 4; therefore I am not in a position to verify how well it works. It seems however that it is not something extremely difficult to worry about. The literature is well presented, and so is the paper. It is a paper focusing on technical details and tools, and not on new ways of achieving a high level of semantic communication. As such, it suits the purpose of the journal, several will find interest in it, and I recommend acceptance. Also, the language of the paper is acceptable. There are few typos here and there (I noticed at least 2-4) which a careful reading will pin down.
Review 2 by Tudor Groza
The paper reports on the SAPR (Semantic Annotations Proxy) system that enables dynamic injection of semantic annotations into Web Services (WS) descriptions. The system was deployed and tested in the context of Geospatial Web Service descriptions.
Being a report on a system, the paper should have a clear emphasis on four aspects: detailed technical description, maturity, importance and impact of the system. On the positive side, one can quickly grasp the importance and great _prospective_ impact of the system (due to the very good application scenarios), that basically bridges the gap between legacy Web Services and today's semantically-aware clients without touching the original WS (similar to the legacy databases case). Also, the system is online, it is fully functional and free to use. On the negative side, the technical description is underspecified, and the system raises serious questions at the maturity and _actual_ impact categories, as the evaluation section simply lists that different forms of evaluation have been performed, but not the actual results, while looking at the services currently registered in SAPR, one can find a total of 7 entries, of which two are duplicated and one appears three times (and it is the toy example presented in the paper). This clearly shows that the system is, in fact, not in use, and it probably represents, at least for now, a proof-of-concept.
Detailed comments:
* Considering, again, the type of paper (i.e., system report), the first half of the introduction is probably dispensable. There is really no need to go into so much detail w.r.t. the research context, when the first and second paragraphs on page two (second column) explain nicely the problem and the motivation behind the system. You could reallocate the space to a more detailed technical description.
* The application scenarios are in general excellent. One possible flaw may exist in the argumentation w.r.t. the usefulness of the data, as exemplified in the last paragraph on column two, page 3. "Without semantic annotations potential clients have no means to find out what these attributes represent". However, in the current setting (without the existence of a concept-based WS search application) the clients would actually use the semantic annotations in exactly the same fashion the GIS applications use the data, i.e., direct hard-coded interpretation and use, with no discovery, or perhaps for disambiguation purposes (in reality, this part is also problematic - see comment below) .
* The separation of concerns description is also great, but only from an analysis perspective, as it is missing the pragmatic side. The content discusses the potential benefits of having separation of concerns but does not mention clearly what is already there (i.e., implemented and usable). Reading the following section sheds light onto this question, and unfortunately reveals a very static / rigid system with respect to this aspect. The ad-hoc injection procedure discussed in the last paragraph on column two, page 5, would profit from a sequence or workflow diagram, showing different possible cases.
* The actual technical description of SAPR seems to be underspecified (it takes only 10% of the entire paper). Some key missing elements are:
-- a more detailed description of the manual annotation and registration process (here it would be really nice to include an annotation and XPath-based binding example - maybe directly the one present at http://semantic-proxy.appspot.com/api/list/references?sid=921e1da1);
-- a richer sequence / workflow diagram for the presented example, showing the calls that happen in the process (this part is well-enough described in text, but the figure seems a bit simplistic)
-- a discussion on what happens if multiple conflicting annotations are uploaded for the same WS (where conflicting annotations are annotations covering the same elements but pointing to completely different concepts)
-- a richer online example, as the toy example taken from the test suit has a single annotation, and it is on an operation rather than on an attribute (as one would have expected).
* The evaluation, as already mentioned, is superficial. It contains no actual data or results that show that a real evaluation was performed. The least that could have been mentioned, in terms of numbers, is how many clients/services/organizations use the system (or plan to use it). Also, although the authors claim that the overhead imposed by the injection process is negligible, it would still be nice to have some graphics showing how does this overhead evolve with the size of the WS description and / or the number and complexity of the existing annotations.
* Finally, a section on future development plans is missing.
* A small remark w.r.t. references, listing 4 - 7 citations in one block is a bit too much, and should be avoided, especially since the authors do not go into details about the individual work presented in those references.
Review 3 by Jacek Kopecky
The paper describes a service for storage and retrieval of semantic annotations for third-party service descriptions.
From the description in the manuscript, it seems like a workable system, however, there are major gaps and problems in the description due to which it is hard to judge the quality, importance and potential impact of the described service. The manuscript must undergo a substantial update before being resubmitted for new review.
The following are the substantial problems, further down I've included comments on lesser details and some wording suggestions.
1) Most importantly, the paper does not describe how the solution is better than an approach where a description is locally copied and annotated with semantics.
Firstly, how does the system process uploaded annotations? It seems that it only allows upload of a complete annotated file, together with a URI for the original, and it appears that the system does some kind of difference comparison of the two files, and store only the annotations. How is the difference computed? Section 4 seems to mention a line-by-line approach which may easily break on some harmless XML changes (such as namespaces, formatting, element reordering, character set recoding etc.), all of which can be done when the domain expert adds annotations.
Secondly, after annotations are uploaded, can the system deal with any changes to the original descriptions? Same harmless XML changes as above can happen, but additionally the description can evolve in backwards-compatible and incompatible ways. The paper should contain a discussion of the robustness of the annotations to changes in the original description.
In conclusions, the manuscript says "dynamic injection relies on a reproducible way to identify the annotation's location" - this should not be in conclusions but very well detailed in the system description!
Thirdly, the new annotated description gets a new URI. Is there any functionality for clients to discover annotations based on original service URIs, other than by searching in the list of all services? Also, since the list of all services is not proper hypertext (because the service ID must be composed into a service retrieval URI), search engine crawlers cannot discover any of the annotated descriptions. It seems that RDF might be a much better match than JSON for the list of all services in the proxy.
Fourth, the paper doesn't really show how the system allows a "client application to specify what annotations are to be injected", or how different annotations can be combined if at all (for clients that specify multiple sets). In fact, the URI (2) in Figure 2 returns an error, so it is not clear that a client can retrieve the annotations at all.
All these points make the system, as described in the manuscript which is what I am judging, actually worse than a simple document repository that would host annotated copies of the original service descriptions.
2) Any kind of evaluation of the system is missing - only the last paragraph of the evaluation section states that evaluation was done, but doesn't describe it. In a Tools and Systems paper, the evaluation should at least talk about speed (how much processing time does the proxy processing add?), size (in what form are the annotations stored? Are there indices for quick search?), and the possibility of distributing the system, should the load become too large for a single machine.
3) According to the paper, there are well-established standards for spatial data, but none for its semantics, correct? One of the goals of the system is to separate potentially conflicting annotations from different sources. This needs to be better motivated because conflicts can also be viewed as very valuable: if two domain experts use the same semantic model to annotate the same service in conflicting ways, either the service or the model is ambiguous; in either case, identifying and resolving the ambiguity can lead to improvements in the quality of the descriptions. In general, on the semantic web, different sets of annotations should be able to coexist and be ignored by clients if unknown.
On a related note, in the system, "information about the uncertainty can simply injected during runtime, and only compatible clients can request this data if needed" - but in semantic systems, the compatibility can often be established when the client sees the data (and possibly discovers ontology mappings that help the client understand the data). In any case, how does the client indicate to your system the types of semantics with which it is compatible?
4) One of the major assumptions of the paper is that "delegation of the annotation from the data provider to the domain experts" is desirable. However, what are the incentives for domain experts to annotate services? If the domain expert is tied to the service provider, they don't need a proxy solution for providing semantic annotations; if the domain expert is tied to the client, it doesn't need a proxy either; if the domain expert is a third party, why would they annotate the service? I've simplified the situation here; it should be discussed in the manuscript, to demonstrate that the system is actually valuable and desired.
Details and wording suggestions:
--------------------------------
- The first citation of [1] is unnecessary, it can probably simply be dropped.
- End of 2nd paragraph in the introduction: logics ensures better precision, not recall (cf beginning of section 3).
- The introduction links feature type 42001 to class Street, a "part of a globally shared ontology accessible on the Web" - where is this ontology? How well-established and standardized is it?
- In "semantic annotation techniques exist for ... media formats (e.g. photos)" I'd suggest to add a mention of EXIF or some such, otherwise the parentheses in that sentence do not match (they contain different kinds of examples).
- SA-WSDL is SAWSDL (no dash), reference [15] is inappropriate for SAWSDL, it should be [22].
- The paper should talk more about how "this focus [of SAWSDL] on WSDL does unfortunately impair its applicability for some scenarios" - where is its applicability impaired and how?
- In the introduction, reference [6] does not seem very appropriate for saying your system is a conceptually simple proxy-based solution.
- I'd suggest that the paper should discuss the relation of the presented Web service to HTTP proxies, because the authors call the service a proxy-based solution. This would help readers who are led by the title to expect an HTTP proxy.
- In section 2.1: "This distinction has been proven useful to capture the sometimes complex functional dependencies in between attributes of data models" - how and where has it been proven? How is this relevant to this paper? The proxy system doesn't seem to "distinguish between local and global semantic annotations" in any way!
- Do you have any real use cases for section 2.1? Any globally shared ontologies? (not the "(made-up) domain concept GeologicEra")
- In figure 2, URI 2 is broken and the results of URI 4 are quite opaque; the paper should discuss the formats.
- Around the data quality annotations: annotations for describing data quality are often property of actual data - different data from the same service may have different quality - e.g. mapping may be very precise in some areas but rather sketchy elsewhere. The paper could list examples of quality annotations that are understood to be global to the whole service.
- In "each measurement inhibits some sort of error" - change "inhibits" to "is done with" or something; the word inhibit means something else.
- Section 3.2: SAWSDL is not only about "instance identification", especially for WSDL elements. It is intentionally unconstrained in the semantics of the link annotations.
- Figure 3 needs a legend.
- In section 4, what are "the original parameters" with which a service identifier can be coupled? How can it be so coupled?
- Section 5 says the proxy is free software - where is the source available for download?
- By page 8, the acronym SDI (used only once in the introduction) is forgotten.
- Is there no related work on injecting of annotations? This would be the kind of related work that this paper should discuss.
- In the Conclusion section: the long-term vision of the SemWeb does *not* assume a complete shift towards semantic-enabled web resources - it expects a coexistence of many kinds of resources on the web, some of which would be annotated semantically, and some with direct semantic representations. For example, representing image bitmap data semantically would be virtually useless; so semantic annotations are here to stay, in one form or another.
- Conclusions should not repeat references; in fact, it often should not contain any references at all.
- References: take care to have the proper case (e.g. [3], not owl-dl reasoner but OWL-DL Reasoner); [16] doesn't have enough information to find and identify the document being cited; there are encoding problems, e.g. [25] has uppercase long umlaut U where it doesn't belong; there are issues with formatting: e.g. [27] says "International Conference on Semantic Computing 0" and [33] says "International Symposium on 0". [19] has duplicated authors.
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/content/special-call-semantic-web-to...
Response to Review 1
Dear Marinos,
thank you for your comments. Please find our response below.
Review:
The paper presents an approach, implemented as a web service, for semantically annotating descriptions of geodata. I understand that the focus is at the explication level, attempting to account for this semantic enrichment while maintaining the standards followed. Separating such annotations from original metadata is a right way to go. I was not able to test the implemented service described in section 4; therefore I am not in a position to verify how well it works. It seems however that it is not something extremely difficult to worry about. The literature is well presented, and so is the paper. It is a paper focusing on technical details and tools, and not on new ways of achieving a high level of semantic communication. As such, it suits the purpose of the journal, several will find interest in it, and I recommend acceptance. Also, the language of the paper is acceptable. There are few typos here and there (I noticed at least 2-4) which a careful reading will pin down
Response:
We have updated the implementation, and added more services which can be tested. The implementation is now also more robust, and supports more service types. The examples do sometimes not work on the server. We are using the files from the SAWSDL-Testsuite, which are fetched (for each proxy request) from the W3C server. This server does sometimes not respond in time (the limits by the Google App Engine are quite strict). We added a discussion about this in the Evaluation section.
Regarding the typos: we did proof-read the paper, and found (and removed) some mistakes.
Response to second review by Marinos Kavouras
Although the paper has improved a lot, the semantic aspect per se is still not clearly presented. Nevertheless, since the character of the paper is fully technical, being a report on a system, it can be considered acceptable taking also into account the general readership of the journal.
Response: The purpose of the presented tool is the semantic enablement of existing Web services by injecting existing semantic annotations into standards-based service descriptions. The tool does not
- create the semantic annotations. This requires tools as mentioned in the beginning of the Evaluation section.
- reason on the semantic annotations. This is in the responsibility of the clients (such as semantic-enabled search engines, workflow composition tools, etc)
In this sense the question for the semantic aspect is really hard to answer. The tool enables semantics for other applications, but is itself not semantically enabled. We added a sentence to the conclusion making this explicit
Response to Review 2
Dear Tudor,
Thanks for this constructive review, it helped a lot to improve the paper (and in the end also the implementation). We have extended the technical description with a new section which gives more details on how to identify and extract annotations. We also replaced the figure with a more detailed activity diagram. But we actually spend the most time fixing bugs and updating the implementation. We have now more service registered (and more are to come since we are using SAPR in our current research project). Still, software systems mature by being used. We hope to get more feedback through this submission.
Please find our responses to the individual comments inline:
* Considering, again, the type of paper (i.e., system report), the
first half of the introduction is probably dispensable. [..]
Response:
We have shortened the introduction and removed the example (since it is explained in more detail in the scenarios section)
* The application scenarios are in general excellent. One possible flaw may exist in the argumentation w.r.t. the usefulness of the data, as exemplified in the last paragraph on column two, page 3. "Without semantic annotations potential clients have no means to find out what these attributes represent". However, in the current setting (without the existence of a concept-based WS search application) the clients would actually use the semantic annotations in exactly the same fashion the GIS applications use the data, i.e., direct hard-coded interpretation and use, with no discovery, or perhaps for disambiguation purposes (in reality, this part is also problematic - see comment below) .
Response:
True, we now made it clearer that clients have to semantically aware (e.g. coming with reasoning capabilities). We re-wrote this paragraph, it now reads like this "In this sense, the data is useable, but far from being useful. Client applications can be semantically enabled (i.e. supporting logic inference and visualization of the referenced knowledge). But without semantic annotations, these clients have no means to identify and retrieve the shared vocabularies to infer what these attributes represent."
* The separation of concerns description is also great, but only from an analysis perspective, as it is missing the pragmatic side. The content discusses the potential benefits of having separation of concerns but does not mention clearly what is already there (i.e., implemented and usable). Reading the following section sheds light onto this question, and unfortunately reveals a very static / rigid system with respect to this aspect. The ad-hoc injection procedure discussed in the last paragraph on column two, page 5, would profit from a sequence or workflow diagram, showing different possible cases.
Response:
We added the following paragraph which makes it hopefully clearer "During the registration of annotated metadata, SAPR assigns a unique identifier (the SID). It is later required to retrieve the annotated document, which could then, for example, be registered to a concept-based search engine. A document registered multiple times with different annotations results in different SIDs, and consequently new service metadata URLs. An URL pointing to the same document (but having a different SID) with data quality references would accordingly be used in trust-aware IR systems. SAPR is only providing means to inject annotations into legacy metadata. It is in the responsibility of the client application to propagate the resulting new service metadata links to appropriate applications like search engines. "
Regaring the "very static". This is true, we followed a very pragmatic approach in SAPR for this aspect. Dynamically infering which set of annotations may be required for the application would require knowledge about the client (and probably some reasoning within the proxy). SAPR is a proxy, not more.
* The actual technical description of SAPR seems to be underspecified (it takes only 10% of the entire paper). Some key missing elements are:
-- a more detailed description of the manual annotation and registration process (here it would be really nice to include an annotation and XPath-based binding example - maybe directly the one present at http://semantic-proxy.appspot.com/api/list/references?sid=921e1da1);
Response:
We have added a new section (4.1) which describes the configuration and extraction of references (with a figure giving concrete examples).
-- a richer sequence / workflow diagram for the presented example, showing the calls that happen in the process (this part is well-enough described in text, but the figure seems a bit simplistic)
Response:
True, the figure has been kept very simple. We replaced it with an activity diagram (and updated its description in the text). It is still kind of simple, but that is because the proxy is conceptually very simple from a client's perspective.
-- a discussion on what happens if multiple conflicting annotations are uploaded for the same WS (where conflicting annotations are annotations covering the same elements but pointing to completely different concepts)
Response:
A the end of section 4.1 we added the following lines "The result of the extraction procedure is the service identifier, which is required to retrieve the annotations. Each registration results in a new service id. One document (with different sets of annotations) can therefore be uploaded multiple times without risking conflicts. "
-- a richer online example, as the toy example taken from the test suit has a single annotation, and it is on an operation rather than on an attribute (as one would have expected).
Response:
The online service does not cover all test cases in the SAWSDL testsuite. I wouldn't really agree it is a toy example. It covers all possible locations of modelReferences in WSDL (both, version 1.1 and 2.0), hence it is a good source to verify and evaluation the functionality of the proxy. Check the testcases for the whole coverage: http://svn6.assembla.com/svn/sapience/modules/sapience-injectors-testcli... The three services from the testsuite in the proxy cover annotation of an operation, interface, and XML Schema element. We have also added more services (mostly running WFS services which are used in the ENVISION project).
* The evaluation, as already mentioned, is superficial. It contains no actual data or results that show that a real evaluation was performed. The least that could have been mentioned, in terms of numbers, is how many clients/services/organizations use the system (or plan to use it). Also, although the authors claim that the overhead imposed by the injection process is negligible, it would still be nice to have some graphics showing how does this overhead evolve with the size of the WS description and / or the number and complexity of the existing annotations.
Response:
Regarding the numbers for clients using the system: SAPR is born within a research project, and used in another. We hope to get more users, and this paper will hopefully help.
Regarding the overhead: We have added a figure in the evaluation section showing the results on concrete testruns on the response time. It shows that many factors like (original) server response time have a far bigger impact on the performance. Evolving with the size of WS descriptions is difficult, since we always require the original Web service for the operation. And they all don't differ much.
* Finally, a section on future development plans is missing.
Response:
We added a paragraph to the conclusion about our future plans
* A small remark w.r.t. references, listing 4 - 7 citations in one block is a bit too much, and should be avoided, especially since the authors do not go into details about the individual work presented in those references.
Response:
Has been only in the case in the introduction, removed annotations which didn't seem necessary
Response to second review
The paper has been indeed improved by the authors according to the received reviews. Most of the critical comments have beed addressed. A couple of small remaining remarks:
* the part about the manual annotation of Web services is still not clear enough, i.e., who does it and when (or why). The reason I keep insisting on it is because, generally, the process of manual annotation always brings along the issue of incentive.
Response: Very true, but we see that as a different research question which we can't address in this paper. At the beginnig of the Evaluation section we reported on research we have performed for building semantic annotation tools. There are also a few other papers in the same call which are better suited to answer this question (e.g. "The Alignment API 4.0", "Transition of Legacy Systems to Semantic Enabled Application: TAO Method and Tools"). We also discuss this in more details in the references [9] and [14] cited in the paper. In the research projects SWING and ENVISION we worked together with IJS, Slovenia who provide semantic annotation tools. A desktop-based tool for the semantic annotations is presented in this video. A web-based tool which is currently under development in the ENVISION project is presented here.
the small experimental evaluation currently contained in the paper brings a certain plus, however it could profit from a more detailed description, i.e., what to the axes represent, what were the factors considered during the experiments, etc.
Response: The text explains now that the vertical axis represents time, the horizontal axis is annotated with "services" in the figure. Response time was the only factor we considered to measure efficiency (all other factors like network load are no different from a solution without the proxy).
In Response to Review 3
Dear Jacek,
thank you for this detailed review. We hope we covered all your comments in the following responses (and the updates in the paper).
> Firstly, how does the system process uploaded annotations? It seems that it only allows upload of a complete annotated file, together with a URI for the original, and it appears that the system does some kind of difference comparison of the two files, and store only the annotations. How is the difference computed? Section 4 seems to mention a line-by-line approach which may easily break on some harmless XML changes (such as namespaces, formatting, element reordering, character set recoding etc.), all of which can be done when the domain expert adds annotations.
Response:
We added a new section 4.1 about the extraction procedure. The "line-by-line" may have been a bit confusing. We use a namespace-aware StAX parser to extract XPath expressions, and we do matching on these XPath statements (which is interestingly more efficient then a pure string-based approach using regular expressions). Being
robust in respect to XML changes was one of the main reasons for this approach (We made it more clearer in the new paragraph)
> Secondly, after annotations are uploaded, can the system deal with any changes to the original descriptions? Same harmless XML changes as above can happen, but additionally the description can evolve in backwards-compatible and incompatible ways. The paper should contain a discussion of the robustness of the annotations to changes in the original description.
Response:
See above. The discussion has been added. Changes which have an direct impact on the XPath expressions (which are the locators) are a problem, we added a discussion about this in the evaluation.
> In conclusions, the manuscript says "dynamic injection relies on a reproducible way to identify the annotation's location" - this should not be in conclusions but very well detailed in the system description!
Response:
Now hopefully better covered in the new section 4.1. Here the XPath expressions are discussed in great length (supported by a new listing).
> Thirdly, the new annotated description gets a new URI. Is there any functionality for clients to discover annotations based on original service URIs, other than by searching in the list of all services? Also, since the list of all services is not proper hypertext (because the service ID must be composed into a service retrieval URI), search engine crawlers cannot discover any of the annotated descriptions. It seems that RDF might be a much better match than JSON for the list of all services in the proxy.
Response:
True, the API has been kept very simple and doesn't for example support searching. We actually just implemented it for us to check the status of the registered services. But this is also on purpose. A proxy is conceptually dumb, (as the man in the middle) it just forwards and redirects. Searching should be covered by catalogs with the proxy URLs in their repositories.
> Fourth, the paper doesn't really show how the system allows a "client application to specify what annotations are to be injected", or how different annotations can be combined if at all (for clients that specify multiple sets). In fact, the URI (2) in Figure 2 returns an error, so it is not clear that a client can retrieve the annotations at all.
Response:
This has been mentioned multiple times in the paper. We made the discussions about this issue more explicit at the end of Sections 3.1 and 4.1
Regarding combining annotations: Different annotations can not be combined. The client application specifies at request time which annotated metadata to retrieve by using the appropriate service id.
> All these points make the system, as described in the manuscript which is what I am judging, actually worse than a simple document repository that would host annotated copies of the original service descriptions.
Response:
True, if we would do a string-based matching only which breaks if the underlying XML changes. A document repository can not reflect changes to the data models (which happens with spatial data services, new feature types (layers) are added and removed all the time). In addition, the proxy does mediate between the client and the original service. Using not a proxy would require knowledge about both, the original service location as well as the copy. We now better described the purpose of a proxy (acting on behalf of the hidden web service in the back), which makes hopefully the need for such a solution clearer.
> 2) Any kind of evaluation of the system is missing - only the last paragraph of the evaluation section states that evaluation was done, but doesn't describe it. In a Tools and Systems paper, the evaluation should at least talk about speed (how much processing time does the proxy processing add?), size (in what form are the annotations stored? Are there indices for quick search?), and the possibility of distributing the system, should the load become too large for a single machine.
Response:
We mentioned performance in the first paper, but we have extended this with a figure. It is really not that relevant, since factors like the response time of the original server are far more important. Size doesn't play a role (was mentioned in the paper), the Google App Engine provides several Gigabytes, we use a few kilobytes. And yes, indices are in place (automatically done by the Google App Engine). Distributing.. well, that's why it is running in the cloud already. We added a sentence explaining how the App Engine does automatic load balancing.
Hence, we mainly focussed on testing if the proxy is able to support the many geospatial service types out there. The tests are all availble in the source respository (link in the introduction or here: http://purl.org/net/sapience/docs). But we thought (and still think) a discussion of these tests is out of scope of this paper.
> 3) According to the paper, there are well-established standards for spatial data, but none for its semantics, correct? One of the goals of the system is to separate potentially conflicting annotations from different sources. This needs to be better motivated because conflicts can also be viewed as very valuable: if two domain experts use the same semantic model to annotate the same service in conflicting ways, either the service or the model is ambiguous; in either case, identifying and resolving the ambiguity can lead to improvements in the quality of the descriptions. In general, on the semantic web, different sets of annotations should be able to coexist and be ignored by clients if unknown.
Response:
Regarding looking into conflicts: Very true, but not in the scope of a proxy. The proxy could be part of a system which retrieves and visualizes conflicting annotations. But by itself, it is simply a component which mediates requests and updates (if required) the response.
Regarding coexist: That's exactly what the proxy was made for.
> On a related note, in the system, "information about the uncertainty can simply injected during runtime, and only compatible clients can request this data if needed" - but in semantic systems, the compatibility can often be established when the client sees the data (and possibly discovers ontology mappings that help the client understand the data). In any case, how does the client indicate to your system the types of semantics with which it is compatible?
Response:
Compatible here means that clients are able to process the annotations. It does not refer to the semantic interoperability. We added "(technically)" to make this a bit more clearer.
> 4) One of the major assumptions of the paper is that "delegation of the annotation from the data provider to the domain experts" is desirable. However, what are the incentives for domain experts to annotate services? If the domain expert is tied to the service provider, they don't need a proxy solution for providing semantic annotations; if the domain expert is tied to the client, it doesn't need a proxy either; if the domain expert is a third party, why would they annotate the service? I've simplified the situation here; it should be discussed in the manuscript, to demonstrate that the system is actually valuable and desired.
Response:
This is a tricky question. The major incentive for SAPR is the integration of legacy web services into semantically enabled infrastructures. The decoupling supports
separation of concern, which itself supports the delegation of the annotation (it is an effect, not a condition). In this case (my experience), the domain experts (usually also the data providers) have no motivation or resources to semantically enable their running infrastructures. If the domain expert is tied to the client, he still needs the semantic annotations in the proxy if he wants his annotations to be used anywhere else besides his own client application. We've added a sentence to 3.1 to further illustrate this. But in general this discussion (who is responsible for the semantic annotations) is out of scope for this paper. We present a tool which supports delegation. If data providers want (or have to) delegate does not affect the tool.
Details and wording suggestions:
--------------------------------
- The first citation of [1] is unnecessary, it can probably simply be dropped.
Response: ok, removed
- End of 2nd paragraph in the introduction: logics ensures better precision, not recall (cf beginning of section 3).
Response: changed
- The introduction links feature type 42001 to class Street, a "part of a globally shared ontology accessible on the Web" - where is this ontology? How well-established and standardized is it?
Response: Has been removed, the introduction is now more concise
- In "semantic annotation techniques exist for ... media formats (e.g. photos)" I'd suggest to add a mention of EXIF or some such, otherwise the parentheses in that sentence do not match (they contain different kinds of examples).
Response: has been removed during rephrasing of introduction
- SA-WSDL is SAWSDL (no dash), reference [15] is inappropriate for SAWSDL, it should be [22].
Response: changed
- The paper should talk more about how "this focus [of SAWSDL] on WSDL does unfortunately impair its applicability for some scenarios"
- where is its applicability impaired and how?
Response: covered in the following sentence in the paper. There are services out there which don't speak WSDL.
- In the introduction, reference [6] does not seem very appropriate for saying your system is a conceptually simple proxy-based solution.
Response: The paper introduces a classification of annotation techniques. Proxy-based (which is cited) is mentioned there as one approach.
- I'd suggest that the paper should discuss the relation of the presented Web service to HTTP proxies, because the authors call the service a proxy-based solution. This would help readers who are led by the title to expect an HTTP proxy.
Response: True, added more details to the introduction.
- In section 2.1: "This distinction has been proven useful to capture the sometimes complex functional dependencies in between attributes of data models" - how and where has it been proven? How is this relevant to this paper? The proxy system doesn't seem to "distinguish between local and global semantic annotations" in any way!
Response: There was a citation in the end of this sentence which explains this in great detail. True, the proxy is not aware of the annotations (it could be anything). But this is the scenario section. Here we try to explain why how annotations are used, and explain why decoupling semantic annotations from the original metadata could be useful.
- Do you have any real use cases for section 2.1? Any globally shared ontologies? (not the "(made-up) domain concept GeologicEra")
Response: We have a repository of the ontologies used in ENVISION at http://wsmls.googlecode.com/svn/trunk/ (with PURLs pointing there). In general, having shared vocabularies is of course an requirement which we base on the assumption that the semantic web will further evolve, and far more ontologies will be created. But I guess that's the case for most Semantic Web tools.
- In figure 2, URI 2 is broken and the results of URI 4 are quite opaque; the paper should discuss the formats.
Response: We included example requests in the output of URI 4. URI 2 is now only an example request, since new SIDs are generated every time a service is re-registered. The old url was pointing to an file from the SAWSDL Testsuite. The W3C server serving the SA-WSDL testsuite files often didn't respond in time, which broke the example.
- Around the data quality annotations: annotations for describing data quality are often property of actual data - different data from the same service may have different quality - e.g. mapping may be very precise in some areas but rather sketchy elsewhere. The paper could list examples of quality annotations that are understood to be global to the whole service.
Response: This is definitely out of scope for this paper. You are right with error in concrete observations (e.g. the accuracy of one concrete location measure). But Uncertainty (like the cited UncertML) information can also be generic. A weather forecast model may (for every predicted temperature value) have an uncertainty based e.g. on the time frame. This would be injected. Uncertainty is just one example mentioned here (together with e.g. provenence).
- In "each measurement inhibits some sort of error" - change "inhibits" to "is done with" or something; the word inhibit means something else.
Response: changed
- Section 3.2: SAWSDL is not only about "instance identification", especially for WSDL elements. It is intentionally unconstrained in the semantics of the link annotations.
Response: True, changed in the text.
- Figure 3 needs a legend.
Response: Has been replaced with an activity diagram
- In section 4, what are "the original parameters" with which a service identifier can be coupled? How can it be so coupled?
Response: Mentioned now in the text, also better explained through the new figure.
- Section 5 says the proxy is free software - where is the source available for download?
Response: The link is in a footnote in the introduction.
- By page 8, the acronym SDI (used only once in the introduction) is forgotten.
Response: True, repeated it.
- Is there no related work on injecting of annotations? This would be the kind of related work that this paper should discuss.
Response: True, and we searched for it quite a bit. There is research on the dynamic creation of annotations based on content-analysis, but we weren't able to find anything about "injection" as we mean it. If you are aware of anything, we would be happy to hear about it.
- In the Conclusion section: the long-term vision of the SemWeb does *not* assume a complete shift towards semantic-enabled web resources - it expects a coexistence of many kinds of resources on the web, some of which would be annotated semantically, and some with direct semantic representations. For example, representing image bitmap data semantically would be virtually useless; so semantic annotations are here to stay, in one form or another.
Response: True, we have rephrased this paragraph (first paragraph in conclusion).
- Conclusions should not repeat references; in fact, it often should not contain any references at all.
Response: True, removed the references.
- References: take care to have the proper case (e.g. [3], not owl-dl reasoner but OWL-DL Reasoner); [16] doesn't have enough information to find and identify the document being cited; there are encoding problems, e.g. [25] has uppercase long umlaut U where it doesn't belong; there are issues with formatting: e.g. [27] says "International Conference on Semantic Computing 0" and [33] says "International Symposium on 0". [19] has duplicated authors.
Response: Good catch with the duplicated authors. We went through all the references again and corrected the mistakes.
Response to second review by Jacek Kopecky
The paper presents a proxy-based storage and retrieval system for semantic annotations in Web service descriptions. It seems to be a workable system, and the current revision of the article is much improved - especially it presents more useful concrete information than the previous I've reviewed - but the article overall still isn't very convincing. The system description says that "the client application has to specify which annotations are to be injected"; and "depending on the client request, different annotations may be added to the metadata", which seems to promise something like the following features: 1) a client can specify multiple annotations to be combined, 2) a client can specify the ontologies that it is compatible with, and the proxy will inject the appropriate annotations, 3) the client may search for available annotations based on the original service metadata URI. These would be interesting, but the system does not seem to support any of them. As it is, the system is very simple and the only actual interesting contribution is that it can maintain the semantic annotations even through some (weakly specified, but at least discussed a bit in this revision) changes to the underlying metadata. That isn't much.
Response:
Correct, as the presented tool is a proxy it remains in the responsibility of the client to perform the mentioned features (as also commented on in my last response). As proxy its purpose is well-defined and limited to managing requests and updating the response if needed. Any other feature should be provided by other tools which can be easily coupled with the proxy. We agree that the system is conceptually very simple (though the implementation has been a completely different story). That was the whole point of it. The question if it is too simple is a matter of perspective.
And quick testing based on Figure 2 and the returned data indicates that the system doesn't work for WSDL services. (see below) The authors should at least test the uses they want to publish.
Response: We made clear in the text that the URLs are just examples which are not working. The SIDs are dynamically generated, and the registered services change all the time. We don't want to add non-persistent URLs to a journal paper. You might want to try it out with one of the currently available SIDs (using the API).
Finally, the next update should be accompanied with a brief but point-by-point statement by the authors about how they handled the review comments.
Response: Not sure what was wrong with the last response, which I posted in the Paper's comments section.
Detailed comments follow:
The introduction says how "clients are not aware of [the proxy's] existence: the proxy takes the identity of the proxied service", which could be a nice feature except it's not explained anywhere; instead the system clearly changes the URIs of the service descriptions (or "the web service location" - whatever exactly that means - as said in Section 3).
Response: Added a paragraph in section 4. The URI pointing to the Web service changes, that's true. But that's all, the client software "believes" that the proxy is the actual service since it acts like it. It isn't aware of the proxy.
In Section 4.1, change "Url (7) in Figure 2" to say Url (5).
Response: Corrected.
Clarify the sentence "The nature of the reference (attribute, sibling, or child) is concatenated, and used to identify the location in the original metadata document." - the resulting XPath expressions in Figure 3 are somewhat ambiguous; in effect, the system changes here the meaning of XPath, so it should say how. For example, the normal interpretation of XPath (2) is "all attributes of an xsd:simpleType element that has a sawsdl:modelReference attribute". And how does the system interpret the "~25437290" part at the end of (4), which doesn't fit the syntax of XPath? (The article says a bit about this adding uniqueness to the location, but more should be said about its handling).
Response: The text said "expressed through simplified XPath-expressions". Changed that to "expressed through patterns resembling simplified XPath-expressions".
Why is there a redirect to the service if request parameters do not match? What kind of scenario is this redirect designed to support? Is this intended for proxying actual interaction with Web services? If so, what are the limitations here? (I can think of two immediately - it only handles invocation through GET, and it only allows parameters other than "sid".)
Response: That's the idea of the proxy, we made that part clearer in in Section 4.1. The only limitation we see is that clients which don't correctly implement the HTTP protocol (i.e. don't process HTTP redirects) might break (but they wouldn't work in any proxied environment as well). Any HTTP method besides GET is supposed to be automatically redirected. Regarding the "sid" parameter: not sure what you mean there.
In Figure 2, URI (2) doesn't work; in fact, none of the WSDL-based descriptions seem to work due to a bad (OGC-based?) XPath in the annotations. This goes directly against the article's statement that the system has been thoroughly tested.
Response: Answered above.
In Figure 3, XPath (4) has a mismatched ' vs. ' and ends with an extra " - is it a real example with editing errors, or is it a made-up example?
Response: Corrected. The examples are taken from the source code and the registered services.
Figure 5 needs a legend - for the vertical axis, the text says "average response times" - is that in seconds? for the horizontal axis, short names or IDs for the services would be useful (even service 1 ... service 4).
Response: The text explains now that the vertical axis represents time. The figure has been updated with a legend for the two axis.
The measurements seem to indicate a speed-up for retrieving the description of a SAWSDL-Testsuite service through SAPR in contrast to direct retrieval; this is explained as "probably due to a more efficient implementation" - implementation of what? How can a proxy that has to access the original source of data perform quicker than a direct access to the original source of data? The only reason I could think of is a better network routing when it goes through the proxy server than when it goes "directly".
Response: The full sentence was "more efficient implementation within the Google App Engine SDK". Answering the question would require an analysis of Google's implementation of java.net.URL. Changed the sentence to "to a more efficient implementation of HTTP request handling within the Google App Engine SDK"
As before, the paper says the proxy is available as "free software" so it should include a link to download.
Response: See Page 2
On Page 9, the use of the acronym SDI (end of first column) precedes its expansion (beginning of second column).
Response: Updated.
In Conclusions, "the client does have to manage the two separate sources" - I suspect you wanted to say "does not".
Response: Correct, updated.
In references 11, 30 and 31, expand on the venue of the publications - 11 doesn't have anything, 30 and 31 only have an address.
Response: Correct, updated.