Review Comment:
The paper motivates, introduces, and evaluates FrameBase as a knowledge base (KB) schema founded on a dual representation of n-ary relations: a neo-davidsonian reification allowing expressive and compact representation of complex events, and a bare direct binary predicate (DBP)-based representation intended to preserve both compatibility towards other/source KB schemas and simplified/legacy querying when the expressivity of n-ary relations is not required.
After a first round of reviews, this second version of the paper far improved under many critical aspects. In its current shape, the paper shows a sound and clear organization which thoroughly accounts the complexity of the FrameBase building process. Relevant changes included better introduction and contextualization of the individual processing steps, relevant terminological clarifications (e.g. "rules" versus "rule constructors"), and inclusion of new figures that clarified several points of possible misunderstanding. In particular, it was particularly appreciated the authors' effort in clarifying a previous critical discussion point (referred as "R29") in the previous review round.
In general, the paper has sufficient content and quality for publication on the Semantic Web Journal. Some points of inherent weakness from the first version remain as they are rooted in the very choices of FrameBase's developers. On the other side, the service potentially offered to the Semantic Web community is extremely relevant and deserves endorsement and encouragement. Eventually, the FrameBase impact will be determined by the authors' capability of enforcing sound development, wide integration, and stability of the T-Box over time and different releases.
Most of the specific flaws still found in the paper only slightly impact its readability, and possibly the decision of the interested reader about testing FrameBase or not. For this reason, applying the individual corrections can be safely left to authors' interest and responsibility. Therefore, my eventual decision is to ACCEPT the paper for publication with no need of further revision by me.
DETAILED PROBLEMS, ERRORS, CORRECTIONS.
Section 1
S1: Note 1: includes two mentions of an "other kind of reification" discussed elsewhere in the paper. Please add a Section reference.
S2: Figure 1 makes a major improvement in clarity of this paper revision. Unfortunately it includes several flaws/inconsistencies which should be corrected in order to allow a full understanding of the long and detailed (de-)reification discussion in the paper:
S2a: Choices in the relations to be displayed are not consistent across subfigures. For example, Fig. 1b shows the non-reified triples, while 1d and 1e don't. This specific point may be solved as well by specific mention in the respective sub-captions.
S2b: the reified nodes should have some label as "reified event" or "reified triple". Again, text in sub-captions may help.
S2c: for a deeper understanding, I was forced to refer very frequently to the OLD Table 1 which was present in the PREVIOUS version of the paper. It was not a good idea to remove that! I understand that keeping the alignment was difficult because the figure now includes far more triples, which is excellent and MUST be kept. However, please make an attempt to extend and re-introduce the old Table 1 because it's critical to understand reification.
S2d: Figure 1f includes two mistakes. First, the two lower sub-blocks should represent respectively "John-1964" and "Mary-1964"; instead, they are now exactly identical. Second, the same two blocks both include "1964" as Partner property values. They should be "Mary" and "John" respectively.
S2e: The subcaptions could explicitly back-refer to respective explanation passages in Section 2.1 to make understanding easier.
S2f: MOST IMPORTANT! It's still very difficult to pair Figure 1 with the complexity analysis made in (current version's) Table 1. Being now n=3 and k=3, the only subfigures with triples matching the counts in Table 1 are 1d and 1e. In no way I managed to match counts in the first three rows of Table 1 (wrt columns "All triples" and "Core") to what I see in Fig 1b, 1c, and 1f. With some effort, it's possible to speculate why the counts depend on both n and k. But, no way to make the constants included in the counts (1, 2, ...) match. Please explain in detail how do you count (n+4)k in 1b, (n+2)k in 1c, and (n+1)k in 1f. Also, please align the pattern names between Figure 1 and Table 1.
Section 2
S3: Figure 2: I advise to label the unlabeled reification node.
S4: Figure 2: Please align naming of relations to Figure 1 (pick either "is" or "was" everywhere)
S5: Table 1: previous item S2f applies. In addition, first 3 counts of the "linking event" are not clear. First make clear if they are complexities (as said in the caption) or exact counts (as said in the paper). I would expect k(k-1)/2 in the latter and a simple k^2 in the former. Why k(k-1)? Are you counting inverse equivalence relations?
S6: in 2.1.1, please align naming of "wasMarriedOnDate" and "gotMarriedTo" with those in the Figures (which are different in turn, see S4).
S7: Also, same paragraph text in 2.1.1 should read "Given TWO triples with property ... and ONE triple with ... we cannot be" (now it's the opposite).
Section 3
S8: In sentence "The less verbose but also..." should be "DEREIFED layer" instead of "reified".
Section 4
S9: First paragraph, typo: "tHe dereified layer"
S10: in 4.2, in item (2): "class frame-Personal_relationship HAS" instead of "have".
S11: in 4.2, item (3) "Intermediate Nodes" is a core algorithmic point of the paper and FrameBase: I strongly advise to provide explicit pseudocode for it. This would allow easier understanding of properties e.g. why the result is a hard clustering with no overlap, etc.
S12: 4.2 is very long. I would at least separate the last part on labeling/annotation and linking, starting from "Names, definitions, and glosses ..."
S13: in 4.3.2, first paragraph. Do you use ALL the English sentences or you only pick some of them? In the latter case, which ones?
S14: in 4.3.2, in " -Dependent (Dep)" paragraph, the second "PP[to]" should be "PP[of]" (?)
S15: in 4.3.2, "The constructors are shown in Figures ??" with unresolved reference
S16: nine lines later, please double check repetition of "" wrt. Fig. 7
S17: second last line of page 17: "agree that they there is": delete "they".
S18: page 18, second column, first line, should be "prEposition".
S19: Algorithm 1, in the "Output" section, should be "prEposition".
S20: Algorithm 1, please define the P set
S21: page 21, first column, discussion about redundancy. To avoid it, quite a blind pruning is applied, that is expected to happen at cost of coverage. Is this truly necessary? Can't this be shifted at querying time, when multiple constraints would naturally reduce/cancel redundancy?
S22: page 21, second column, "The Kuhn-Munkres algorithm", delete repetition of the word "algorithm".
Section 5
S23: Coverage. A major point of weakness in the paper is the lack of coverage evaluation for the ReDer constructors/rules, which are a core mechanism. In principle, I would like to see how many average DBPs are inferred from the reified frames, for which you compute 9.45 average frame element definitions each. Your arguments on the topic would probably include that 1) many non meaningful DBPs are intentionally skipped (like LOCATION-TIME triples) and that 2) your explicit intention is to establish a highly precise conversion process, and you actually achieve this as shown ion Table 2. Nonetheless, let me warn that coverage issues will eventually make or break FrameBase's success. My warm advise is to take coverage into serious account, and maybe start including actual numbers even in this paper version.
S24: Section 5.1: Hanging "Section [?]" BibTex reference.
S25: Section 5.2: Typo: "An resulting average"
Section 6
S26: Typo: "the same so instantiation"
S27: Section 6.2. It leaves the reader unsatisfied since it mentions results from an external work without mentioning the method. I think you are allowed to include a few lines of summary from [48] and explain how these basic integration rules are obtained. This would improve understanding and integration with Section 7.
|