Multilingual Semantic Transformation through Simple Sentence Mediation

Tracking #: 2499-3713

Peng Qin
Jingzhi Guo
Quanyi Hu

Responsible editor: 
Anna Lisa Gentile

Submission type: 
Full Paper
In multilingual semantic transformation, semantic representation faces challenges in ambiguity and consistency. It is an important research topic in document exchange and NLP. Here, in terms of document representation, current researches have not been solved a complex concept for a single cell of a table. Thus, this paper proposes a novel collaborative framework - Simple Sentence Mediation by leveraging a Semantic Input Method for sharing common atomic concepts. Besides, we design a transformation approach – a sentence-based Machine Universal Language enables any simple sentences to be represented into the bags of concepts from a sequence of atomic concepts. Our method ultimately resolves the semantic shift problems to achieve the sentence-level semantic exactness, as well as machine-readable and -understandable in the heterogeneous semantic transformation.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Sep/2020
Review Comment:

This work deals with the task of semantic disambiguation of concepts in machine translation. The authors propose a Sentence-based Universal Machine Language (SUML) that consists of several steps, such as literal signification, case appending, and machine representation. Features are extracted based on Part of Speech (POS), Morphological Change of Sign (Grammar rules conditioned on the POS category), Syntax and Meaning.

Related work discusses some methods related to the task, however, the paper lacks clarity and novelty in many dimensions. First, writing needs substantial improvements. There are plenty of typos and grammatical errors throughout the paper, which limit the understanding of the contributions. Due to these errors and the improper use of notation, it was exceptionally hard to follow the description of some sections, for example, Definitions 3-4 should be further explained.

Notation should be fixed throughout the paper:
1) Ei := List (w1, w2, …, wn) and Cj := List (w1, w2, …, wn) indicate that both source and target sentences have the same length (n words) is that correct? Then in (1) Heterogeneous grammatical rules, notation changes to Ei and Ci (notice the error in indexing C as Ci instead of Cj)
2) Functions are typically written left to right, i.e., fc = SiShi --> SiSm and not fc := SiSm <-- SiShi
3) Words in definitions 1 and 2 (eq.1-4) can not be summed, so the summation terms do not make sense.

The evaluation details are quite sparse: how is the test data created? It appears to be 100 translation pairs in total. Isn't this a relatively small number of test examples? Where did these sentence pairs come from? How are they translated, are they originating from a specific parallel English-Chinese corpus? Why is the evaluation based on WordNet? It seems that the interface in Figure 5 resembles WordNet as well, e.g., term definitions, antonyms and synonyms can all be found through WordNet. If the system tries to mimic WordNet similarity, then what is the contribution of the overall complex UML language presented? Also, to my understanding, such representation might not scale well as the vocabulary increases or as the different types of features increase. Since language is a combinatorial problem, I would expect that some terms will be mapped to multiple categories, for example from Table 2. Are such cases handled with the proposed SUML framework? How does this work relate to advanced machine translation methods that are mainstream nowadays, e.g., sequence to sequence models, transformers & contextualized language models? There was no such baseline in the experimental results, which seem preliminary and only measure how close to WordNet similarity each system is.

Regarding typos and grammar errors, here is a list of a few tracked:
1) "current researches have not been solved a complex concept for a single cell of a table."
2) "achieve the sentence-level semantic exactness, as well as machine-readable and -understandable"
3) "Semantic document representation plays an important role in NLP and information theory [27][20],
such as semantic web [31]."
4) "In the multilingual semantic transformation, it is a process of"
5) "SMUL achieves a simple sentence"
6) "However, it is a heavy workload and exists huge data redundancy to construct"
7) "For example, [8] claims the accurate mapping"
8) "and the process should be reliable and computational"
9) "and cannot guarantee semantic representation universal and exactness in the cross-linguistic"
10) "from CoDic to *constrains* sentence"

At this point, stopped tracking as due to very frequent improper use of English, it was difficult to catch all such errors. Authors are suggested to carefully go over writing and provide intuition and explanations for all design choices.

Review #2
Anonymous submitted on 05/Oct/2020
Review Comment:

The authors start with an exaggerated claim in the abstract - "Our method ultimately resolves the semantic shift problems to achieve the sentence-level semantic exactness, as well as machine-readable and -understandable in the heterogeneous semantic transformation."

Despite the fact that the introduction is well-written and easy-to-follow, the related work section lacks some important research papers regarding multilingual semantic representation. By considering that the authors mentioned BabelNet, it misses some papers from the same group, such as MUFFIN [1]. Additionally, the following claim is not correct:
"However, such approaches focus on lexical-semantic rather than on sentence or text semantics and cannot guarantee semantic representation universal and exactness in the cross-linguistic.". Moreover, pertaining to the multilingual semantic representation, the authors could have mentioned or used some recent works such as MUSE [2][3] to alleviate the ambiguity problem. I think the confusion about the semantic representation lies in the authors' explanation. Although the introduction is well-written and direct, the authors refer to semantic representation as a broad concept. When I read the introduction, I expected to see some work in translating documents via knowledge graph/word embeddings, for example, using RDF2Vec or similar approaches. I suggest the authors guide the audience since the beginning by clearly pointing out such difference and their goal. Apart from the machine learning perspective and going through the Semantic Web, we have some semantic representations such as NIF[4] and Ontolex [5], which could also be re-used to model sentences and act as "universal" language in this work.

The very critical point is that the authors did not cite or refer their own work published in 2018 [6], which is quite similar. Additionally, the authors have published other similar works and not referred here, such as [7], [8], and [9]. Some parts are even copied from these papers. Thus I struggle to understand if this paper is an extension or a new approach and why the authors omitted their own work.

[1] - Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015, July). A unified multilingual semantic representation of concepts. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 741-751).

[2] - A. Conneau*, G. Lample*, L. Denoyer, MA. Ranzato, H. Jégou, Word Translation Without Parallel Data

[3] - G. Lample, A. Conneau, L. Denoyer, MA. Ranzato Unsupervised Machine Translation With Monolingual Data Only

[4] - NIF - [1] S. Hellmann, J. Lehmann, Sören Auer, M. Brümmer Integrating NLP using Linked Data Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, October 2013

[5] - Ontolex - McCrae, John P., Julia Bosque-Gil, Jorge Gracia, Paul Buitelaar, and Philipp Cimiano. "The Ontolex-Lemon model: development and applications." In Proceedings of eLex 2017 conference, pp. 19-21. 2017.

[6] - Qin, P., Guo, J., Xu, Y., & Wang, L. (2018, October). Semantic document exchange through mediation of machine natural language. In 2018 IEEE 15th International Conference on e-Business Engineering (ICEBE) (pp. 245-250). IEEE.

[7] - Yang, Shuo, Ran Wei, Jingzhi Guo, and Hengliang Tan. "Chinese semantic document classification based on strategies of semantic similarity computation and correlation analysis." Journal of Web Semantics (2020): 100578.

[8] - Qin, P., Guo, J., Shen, B., & Hu, Q. (2019, October). Towards Self-automatable and Unambiguous Smart Contracts: Machine Natural Language. In International Conference on e-Business Engineering (pp. 479-491). Springer, Cham.

[9] - Qin, Peng, and Jingzhi Guo. "A novel machine natural language mediation for semantic document exchange in smart city." Future Generation Computer Systems 102 (2020): 810-826.

Please rephrase this sentence, "it"
it is a process of presenting and exchanging semantic information in the heterogeneous parties or domains (such as party A is English user, and party B is Chinese user in Fig. 1) to achieve semantic consistency.

"One technical challenge is the lack of consistent transformation across domains since the multilingual text usually does not share the same meaning between semantic communities." - reference?

Review #3
By Mehwish Alam submitted on 23/Nov/2020
Review Comment:

The authors propose a novel framework for multi-lingual semantic transofrmation. This collaborative framework - Simple Sentence Mediation leverages a Semantic Input Method for sharing common atomic concepts. The authors introduce an approach Machine Universal Language which generates bags of concepts from a sequence of atomic concepts.

The first section of the paper is very confusing. By multilingual semantic transformation do the authors mean machine translation? I could not really get the objective of the paper. If I summarize in my own words, the authors want to propose a knowledge-aware approach for machine translation by taking into account the semantic shifts in the meaning of the words due language dependent context. But this is not very straight forwardly described in the problem statement.

At some point the authors are also talking about polysemy, i.e., one word can have several meanings. The example to this is a "refriogerator". Yhe authors write "For instance, “refrigerator” refers to several meanings in English." which is not the case.

The introduction is very mixed up and hard to follow. The authors should give a clear picture of the problem as well as what they expect as an output with example.

The authors also try to explain the problem of semantic shift but in my opinion it means the evolution of word usage. This variation can also be across cultures.

According to my understanding the authors are generating machine readable format from text and then regenerating the text out of that. It does not make much sense to me.

In the related work, the authors are talking about document representation. I do not understand what it means. Is it the machine understandable output from a document? How is it related to the problem at hand.

The authors are further talking about Multilingual Semantic Representation where they introduce Semantic Role Labeling using FrameNet (which is not multi-lingual). They are further talking about AMR representation. How is it related to multi-linguality?

Third section introduces the approach proposed by the authors where the first step is "sentence computerization", then "literal representation". Again, what does that really mean?

Many terms are introduced without defining such as SMUL in the overview.

Figure 3 is completely not understandable.

Authors say at some point that they are using sign theory to generate machine readable representation, why is it? Why not Knowledge Graphs.

Section 4.2 says supervised sentence through SIM. What does that exactly mean. Same is the case with the heading of section 5, i.e., transformation from human sentence to computer sentence.

Finally, in the experimentation section it seems like the authors are trying to target the problem of machine translation since the comparison is with Google translate and Bing but there are many approaches out there for targeting this (cf. [1]).

The experimentation is somehow reporting the similarity for the STS (Semantic Textual Task) which seems strange. Thorough experimentations are missing.