Review Comment:
The paper describes FrameBase: a new semantic web resource that provides a frame-base schema for representing knowledge. Together with the resource, the authors contribute with a method for automatically generating the FrameBase schema, as well as an alignment between FrameNet and WordNet. The claim is that such resource can be used for allowing easier integration of multiple heterogeneous sources. The authors support this claim by showing examples of integration rules, and by producing actual integration with some of the most popular semantic web knowledge bases, such as DBpedia, Freebase, and Yago.
An evaluation of the work is also discussed as well as potentialities and limits of the current status of the work.
Overall comment:
The article extends a conference paper, and I consider such extension valuable, as far as suitability for publication on the semantic web journal is concerned.
Although there are some aspects that need clarification and improvement, the work has merits in its potential impact on future research in semantic technologies. The resource alone deserves to be disseminated in a more complete article than a conference paper.
The main weaknesses of the paper are its presentation and the evaluation (at least in the way it is reported), which motivate my request of major revision. I think that the community should have detailed insights and easy understanding of the work, hence presentation is key.
I also recommend to have a native speaker to proof-read the paper.
Often, the author use terms such as "above", "below", "later", to refer to examples or formulas. I recommend to embed such examples/formulas in some environment that allows for precise references (e.g. number for the forumlas and the examples). Such current generic references make the paper hard to follow.
In general, the paper would benefit from including more examples. The ones (not few) provided need to be better included in the narrative. They should immediately follow or precede the more theoretical/formal text, which explains their underlying concepts. Some complex concepts are described only abstractly and in-text, while a narrative such as: informal definition -> example -> formal definition, would make the paper much more readable and effective.
In addition, the paper sometimes lack details. The authors should write as if the reader wants to replicate their work, hence providing her with enough details to succeed.
In the current state, a non-NLP-expert would find some passage very hard to understand as there are certain things that are not explained, sometimes even not cited (see detailed comments for specific references). I recommend the authors to do their best to make the paper self-contained as the target readers of this journal are not always NLP experts.
Section 5 and Section 6 need restructuring. In Section 5 the authors may want to have a paragraph for each text-design issue and associate it with the relevant examples, rules/solutions and their explanation. Section 6.1 and 6.2 should be merged and restructured in a similar way.
I think it would be important to have a discussion about the coverage of FrameNet and it's impact on FrameBase, especially during integration with external KBs. Have the authors met some situation where FrameBase schema would not align to some class or property in one of the KBs? How did they deal with it? Is this a marginal issue (i.e. it happens very rarely)? Whatever they have to say about this aspect is relevant and interesting to the reader (at least this one).
Detailed comments:
Table 1 needs to be accompanied by explanation in text, and should also include the schema.org roles modelling, which is later discussed.
p.5: Provide examples of CVT
Does CVT stands for "compound value types" or "composite value types"? At p.5 both are used, and it's confusing.
p.5: the following paragraph needs rephrasing and must be accompanied by an example
"While CVTs do not represent
frames or events per se, from a structural perspective,
they can be regarded as isomorphic to a neo-
Davidsonian representation with specific roles (see
Table 1). However, Freebase places a number of
restrictions on CVTs. For instance, they cannot
be nested, and there is no hierarchy or network of
them that would for example relate a purchasing
event to a getting event."
p.5: A more clear definition and an example of a FrameNet frame should be included in the paper as early as possible, considering that FrameBase's core inspiration and principle are linguistic frames.
p.5: When citing references 17, 32, 33, the authors may want to elaborate more on the synergies and differences between these works and FrameBase.
p.6: The first paragraph of Section 4 would benefit from rephrasing
p.7: When discussing LU disambiguation, please show an example to explain the process and the design choices (e.g. one-to-one mapping)
p.7: The S(l|a,b) function should have a formalised definition and be accompanied by an example.
Section 4.2: It's not clear why the authors use and refer to RDFS instead of OWL. They should justify this choice. Especially considering that it's clear that they are exploiting OWL semantics (also when using rdfs:subClassOf and certainly when using transitivity and symmetry). The description of the schema semantics must be rigorous, especially considering that the main purpose claimed by the authors is KB integration.
As for "perspectivization" the author should elaborate more on this concept and explain its semantics, and then motivate their design choice. owl:equivalentClass seems to me more appropriate than rdfs:subClass in certain cases. Also, it is not evident how the inversion of roles work: is this information provided by FrameNet? Is it extracted in some way?
p.8: Figure 1 should be closer to the text referring to it.
p.8: Please provide examples of LEMON annotations.
In the state of the art section and/or at the end of Section 4, the authors should consider referencing BabelNet and aligning this resource to FrameBase as it includes WordNet and address multilingualism:
"R. Navigli and S. Ponzetto. BabelNet: The Automatic Construction, Evaluation and Application of a Wide-Coverage Multilingual Semantic Network. Artificial Intelligence, 193, Elsevier, 2012, pp. 217-250."
Section 5:
The work on ReDer rules has some similarity with http://www.semantic-web-journal.net/content/hyperlinks-semantic-web-prop...
this similarity, although the two works target at different goals, showed in several parts of the paper. The authors may want to look into this article and consider commenting on synergies in the related work section.
p.9: Explain the DBP acronym the first time it's used.
p.9: Figure 2 is presented as a dereification rule: doesn't it show a ReDer rule?
p.9: when showing examples of ReDer rules, the author should comment and explain in detail what is happening there, also by referring to how the terms of the rules are identified (e.g. by saying that the FrameNet frame is "Statement", the LU is "write", etc.).
Section 5.2:
I suggest to start the explanation of ReDer rules with an explained example.
p.11/12/13: all the examples should be spread within the text accompanied by detailed explanations.
p.11: please clarify what the dots are for, e.g. :frame-Forming_(...)-divorce.v
p.13: what prep, cop, dobj, nsub, are for? They should be defined and explained in detail.
p.13: please, cite and explain Kuhn-Munkres algorithm.
Section 6:
- the first two paragraphs are really hard to understand. Please, provide a formal definition for integration rule.
- it is not clear if integration rules have been defined manually. If so the authors should provide an estimation of the effort needed and the criteria for selecting/discarding the entities involved.
- p.15: the discussion about more specific and more general elements is unclear. It's not clear also what sentences such as "the substitutions for ?f that fire the rest of the examples..."
A large part of Section 6 is devoted to discuss limits/problems of integration. The authors seem not to have a viable solution at the moment to such problems. Hence, they should move this discussion in a dedicated section, preceding only the concluding section. Such a section should be structured in order to discuss limits, possible solutions, and the results of evaluation.
Considering that support to integration is the main claim of the authors, it sounds odd to read that tranformations are hard to do automatically. The authors should better support their claim, or make it less strong, in the light of this consideration (I'm referring to the discussion in Section 6.2)
Suggestion: As far as difference in representations choices is concerned, don't the authors think that conventions and guidelines on how to use FrameBase would help reducing this problem? If they agree, providing such guidelines would enrich their contribution and make it stronger (especially for supporting the integration-support potential).
Section 7:
The evaluation needs to be better presented, more rigorously and more detailed.
What are the evaluation tasks? What is asked to raters? What are their guidelines? What measures are used (define them, indicate intervals and their meanings), what precision/recall means in the context of each task?
Please, use tables for presenting a summary of the results.
Only two raters, being authors of the paper, where involved in the evaluation. This is a bit weak per-se. But even if one wants to overlook this weakness, at least the procedure followed, how they solved disagreement, and the degree of agreement should be reported.
As earlier mentioned, the authors should add a "Discussion" section, where they comment on the evaluation results and elaborate on the limits of the approach (e.g. moving part of Section 6).
Minors:
I suggest to use the present tense instead of the future.
The author should avoid to use "so-called": for example at p.5 "so-called neo-Davidsonian representations" should simply be "Davidsonian representations"
p.2: will me -> will be
p.4: I suggest to remove Footnote 2 and include its content within the main text
p.5: adapt -> adopt
p.5: the semantic role -> their semantic role
p.6: adopted -> adapted
p.6: an reasonable -> a reasonable
p.6: might be reified on its own: strange phrasing
p.6: (3.2 b)) the mappings -> their mappings
p.6: for into -> into
p.6: the KB -> a KB
p.7: connected -> interconnected
p.7: semantic pointers -> semantic relations
p.7: better match -> to better match
p.7: the particular events -> specific events
p.9: Figure 2 and Figure 3 have the same caption
p.9: avoid sentences such as "perfect annotations" unless you can provide a contextualised definition for "perfect"
p.9: -s, -o -> -S, -O
|