Review Comment:
The authors present an approach that exploits a graph created from bilingual dictionaries to infer translations, thus to create new dictionaries or enrich existing dictionaries. Furthermore, the approach enables to infer synonyms. The approach is built around the detection of cycles. A couple of properties are identified of translation graphs that can enable a graph to be partitioned.
Generating new and enriching existing bilingual dictionaries is an interesting problem and clearly has practical applications. The authors show in detail how they build upon the approach by Villegas et al. The first contribution is that instead of identifying cycles in the entire translation graph, which is computationally hard, they propose to partition the translation graph into biconnected components that can be processed individually without missing any cycle / any translation. As a second contribution, they propose to reduce the translation graph by removing those translations that translate between words with different parts-of-speech. The authors provide empirical evidence that this reduction can lead to smaller biconnected components.
While the technical contribution regarding the extension of the approach by Villegas et al. might seem a bit small for a paper of 24 pages, the authors provide a couple of contributions beyond the ones mentioned so far. They describe hyperparameters of the original approach and of their approach in detail and reduce the number of parameters actually necessary. I really like that the paper contains this discussion. A couple of carefully-crafted experiments have been carried out for which new metrics were introduced. The results are discussed in sufficient detail and show the strengths of the proposed approach, which appears to run for a large dataset in under a minute, and show that held-out bilingual dictionaries can be retained to a large extent - both the speed and the quality are impressive. The evaluation nicely compares BWP and vanilla recall and discusses how evaluation schemes based on precision could grossly underestimate the performance. All parameters of the experiments were clearly described. The authors took great care to make the evaluation of the recall meaningful, by making use also of externally existing data and human assessment. Furthermore, the comparison between the cycle density and the transitive baselines was very interesting. Code and data are available online and are well documented. The quality of writing is fine.
I have a couple of comments, ordered according to where they first occur in the paper.
Major comments:
C7. p3, column 2, section 2.1. Given the 44 languages and 53 language pairs, for how many language pairs would a pivot-based method be applicable? It would be nice to compare against the approach the authors have extended without the extensions.
C8. p4, column 1, lines 12ff. If CQC can be used to validate a dictionary, then won't it be possible to use it in the context of dictionary building, too? This needs to be explained better.
C9. p4, column 1, lines 31ff. I find this paragraph very confusing. E.g., "Both approaches" refers to SenseUniformPaths and the approach by Villegas? What are ambiguous cycles? The differences between cycle density and graph density are not clear.
C11. p5, column 1, line 23ff. Even if part-of-speech information is not taken into account, wouldn't it be simple to take into account part-of-speech information in a post-processing step? Similarly, in your approach you make use of POS tags in a post-processing step (Section 5.2).
C12. p5, column 1, line 45. I wonder why the translation graph is defined as an undirected graph. I am not a expert in translation. However, I expected the graph to be directed. If given a term in one language no corresponding term exists in the other language, then only the edge pointing to the more general term would make sense? If the translation graph is commonly defined as an undirected graph, then leave it as it is. If it is a simplification, then the advantages and limitations need to be discussed.
C13. p5, column 2, line 1. Given the tuple (rep,lang,pos). Since pos does not seem to be a sequence of POS tags but instead a single POS tag, does this mean that rep is a single token? Thus, cross-language correspondences between multi-word expressions cannot be expressed in / be derived from the translation graph? I believe that defining pos as a sequence would not cause problems to the proposed approach. Please clarify.
C14. p6, column 1, end of section 4.1. It would be helpful to have examples here. Moreover, I wonder whether one either wants to extend a translation graph wherever possible, or whether one wants to focus on a specific language pair, and thus try and find circles that begin with one language and that lead to the other language. Why I am confused: according to p5, col2, lines 18ff the aim is to enrich the graph, whereas on p6, col 2, line 35 a target language pair is mentioned.
C15. It is not clear to me why the approach is based on cycles at all. For example, why not identify quasi-cliques? Cycle density would then be the density of the γ-quasi-cliques. What about differences in complexities of identification of γ-quasi-cliques and identification of cycles?
C23. p10, column 2, Implementation. I would be nice to have an overview over the entire approach, e.g., in the form of pseudocode. Information about configuration files and command line parameters is of not much relevance and could be omitted from the paper. The implementation described does not mention the filtering based on POS tags etc. and does not explain how either synonyms are inferred vs. how translations are inferred.
C24. p11, column 2, Figure 3. Before I saw the figure I believed that I understood what a biconnected component is created from, namely, nodes representing words and edges representing translations. However, the graph shown has languages as nodes. Is this the graph that one obtains if every node that represents a (word, language) tuple is replaced by the language? Please clarify.
C29. p19, Figure 19. I really do not understand neither the Figure nor its description in Section 10.3. Please clarify.
Minor comments:
C1. p1, abstract. I do not see how "in recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web" actually "is allowing the growth and application of openly available linguistic knowledge graphs". There is a disconnect.
C2. p1, abstract. Besides the percentage values, it would be interesting to also provide absolute numbers, e.g., how many new correspondences were found. It would be fine to not mentioning them in the abstract, but even from the other content of the paper these actual numbers could be clearly stated.
C3. p2, running title. The title "Bilingual dictionary enrichment via graphs" does not describe the content of the paper very well.
C4. p2, column 1, line 9. "thorough" -> "through"
C5. p2, column 2, example in line 24ff. It would be nice to show this as a graph. This will also show the reader that the graph of correspondences is undirected, which I do not find intuitive.
C6. p2, column 2, line 45. I have an idea what you mean with "previously empirical ones", but this could probably be reformulated.
C10. p4, column 2, line 7. According to Figure 1, for Sardinian and Italian it seems like Catalan is a pivot language. Then, OTIC would be applicable?
C16. p6, column 2, lines 5ff. Although some of the constraints are rather intuitive, it would be nice for example for constraint c to have an explanation, or even a (counter)example.
C17. p7, column 1, lines 1ff. It should be "different maximal biconnected components can only share cut vertices" instead of "different biconnected components can only share cut vertices". Otherwise, if one component is a sub-component of the other component, then none of the shared vertices would be a cut vertice. Later in the text, e.g., p7, col 1, line 32, you should also write maximal biconnected components instead of biconnected components, because otherwise, given a non-maximal biconnected component, wouldn't you miss some cycles?
C18. p7, column 1, line 6. "O(V + E)" should be "O(|V| + |E|)"
C19. p7, column 1, lines 25/26. the comma in "\forall v \in V(C), v \in V(B)" should most likely be a ":".
C20. p7, column 2, Table 1 - the table is not referenced or discussed in the text.
C21. p7, column 2, Table 3 - size means number of edges?
C22. p8, column 1, Section 5.3. Make clear how the biconnected components are used, e.g., how to create context graphs from a set of maximal biconnected components given a source word. I believe the paragraph should begin with something like "Within a biconnected component we ..."
C25. p15, column 1, line 40. consider changing properNoun to proper noun. This appears a couple of times in the text an in figures.
C26. p15, column 1, line 44. "often encode grammatical changes instead of semantic similarity". Could you explain this further? How does this result in high precision for open POS etc.?
C27. p16, column 2, line 47. It would be interesting to mention how the accuracy metrics differed across the 27 language pairs.
C28. p16, column 1, Table 5. Usually, one would set the best values per column in bold. Also: Table 6.
C30. p22, column 2, line 9. "It makes the most of" sounds a bit exaggerated.
C31. p22, column 2, line 35, "amount extra translations" -> "amount of extra translations"
C32. p22, column 1, lines 20/21. "relying purely on translation graphs over documents". probably, "over documents" can be removed? then, it would be more clear.
C33. p22, column 1, lines 23ff. This can be misleading to the reader. Of course the approach does not require training time, because no model is trained. However, cycle finding still takes time.
|