Linking discourselevel information and induction of bilingual discourse connective lexicons

Tracking #: 2898-4112

Sibel Özer
Murathan Kurfali
Deniz Zeyrek
Amalia Mendes
Giedrė Valūnaitė Oleškevičienė

Responsible editor: 
Guest Editors Advancements in Linguistics Linked Data 2021

Submission type: 
Full Paper
The single biggest obstacle in performing comprehensive cross-lingual discourse analysis is the scarcity of multilingual resources. The existing resources are overwhelmingly monolingual, compelling researchers to infer the discourse-level information in the target languages through error-prone automatic means. The current paper aims to provide more direct insight into the cross-lingual variations in discourse structures by linking the annotated relations of the TED-Multilingual Discourse Bank, which consists of independently annotated six Ted talks in seven different languages. It is shown that the linguistic labels over the relations annotated in the texts of these languages can be automatically linked with high accuracy, as verified against the semi-automatically linked relations of three diverse languages. The resulting corpus has a great potential to reveal the divergences the languages exhibit in local discourse relations, with respect to the source text, as well as leading to new resources, as exemplified by the induction of bilingual discourse connective lexicons.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/Nov/2021
Minor Revision
Review Comment:

Upon reading the revised draft, I am happy to see that the authors have addressed the points more or less well. A few more (minor) points identified in my second review are listed below:

Page 1, Line 39-41: The way the text reads, it seems that Connective-Lex presents linked data. If so, you should describe how the data in the lexicons under the database are linked.

Page 2, Line 22-24; 2-4: In my opinion, a single point is redundantly divided into two. You may want to merge them, making the single point more succinct.

Page 5; example 7: I think the example (how much the argument differs across languages) should be described more clearly.

Page 6, Line 19-23: The point on the possibility of multiple relations within the same text span needs more elaboration, preferably with examples.

Language: I noticed some grammatical errors in the updated paper. Examples:

Page 4; Line 50: “This design criterion lead to” > leads to/led to

Page 5; Line 34: “… and their evaluation is provided” > are provided

Page 10, Line 26: “The linking performance… are measured” > is measured

Review #2
Anonymous submitted on 15/Nov/2021
Review Comment:

The authors have exhaustively addressed all previous comments, and I was very happy of the improvements.
The paper is much clearer and easier to follow.

The presentation of the two methods used to align discourse segments and relations across the parallel corpora is clear and sound - description of Method II has improved a lot.

Editorial comments:
- move Table 2 to be closer to where it is referred to in the body of the document
- move Table 3 to be closer to where it is referred to in the body of the document
- pag 8 line 14: incriminating --> increasing?
- pag 9 lines 15/18: check for consistency the use of italics for the connectives in the body of the text
- pg 10 line 20 and following: when presenting the sources of errors, I would use the \paragraph{} command in LaTeX rather than starting the paragraphs with a sentence in italics
- Table 9 and Table 10: I would anticipate and discuss Table 10 first -move it to where Table 9 now is and then present and discuss Table 9

Review #3
Anonymous submitted on 02/Dec/2021
Review Comment:

The changes made have significantly improved the paper.

As an addition to my previous suggestions and those of my colleagues, as well to the improvements made by authors, I have only one minor issue, and that is:

The linked_ted-mdb files (@ Long-term Stable Link to Resources) do not open in Microsoft Excel (Mac version 16.52). However, I can open them in LibreOffice. It must be something small. Please give them a second look and improve them.

Otherwise, the rest fits.