Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Bilingual dictionary generation and enrichment via graph exploration

Submitted by Jorge Gracia on 09/19/2021 - 08:49

Tracking #: 2899-4113

Authors:

Shashwat Goel

Jorge Gracia

Mikel Lorenzo Forcada

Responsible editor:

Guest Editors Advancements in Linguistics Linked Data 2021

Submission type:

Full Paper

Abstract:

In recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web. Such linguistic linked data has stimulated the development and use of openly available linguistic knowledge graphs, as it is the case or Apertium RDF, a collection of interconnected bilingual dictionaries represented and accessible through Semantic Web standards. In this work we explore techniques that exploit the graph nature of bilingual dictionaries to automatically infer new links (translations). We build upon a cycle density based method: partitioning the graph into biconnected components for a speedup, and simplifying the pipeline through a careful structural analysis that reduces hyperparameter tuning requirements. We also analyse the shortcomings of traditional evaluation metrics used for translation inference and propose to complement them with new ones, both-word precision (BWP) and both-word recall (BWR), aimed at being more informative of algorithmic improvements. On average over twenty-seven language pairs, our algorithm produces dictionaries about 70% the size of existing Apertium RDF dictionaries at a high BWP of 85% from scratch within a minute. Human evaluation shows that 78% of the additional translations generated for enrichment are correct as well. We further describe an interesting use-case: inferring synonyms within a single language, on which our initial human-based evaluation shows an average accuracy of 84%. We release our tool as a free/open-source software which can not only be applied to RDF data and Apertium dictionaries, but is also easily usable for other formats and communities.

Full PDF Version:

swj2899.pdf

Previous Version:

Bilingual dictionary generation and enrichment via graph exploration

Tags:

Reviewed

Long-term Stable Link to Resources:

https://github.com/shash42/ApertiumBidixGen

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By John McCrae submitted on 25/Oct/2021

Suggestion:
Accept

Review Comment:

I think this paper is in a very good state and the authors have taken into account the comments well.

A few minor issues I found in this reading:
p1. l23 "as it is the case of" => "as is the case for"
p1. l29. "on average" reads odd. I would remove.
p3. l6. "time response"... I guess you meant "response time" but you should probably just say "computation time" or "execution time"
p22 l19. I think it should be "(m)" after masculine

Review #2

Anonymous submitted on 22/Nov/2021

Suggestion:
Accept

Review Comment:

In the reviewed version of the paper, authors have properly addressed reviewers' comments clarifying the content when needed.
Minor remark: p2, c2, l47: pachina -> panchina

Review #3

By Basil Ell submitted on 28/Nov/2021

Suggestion:
Accept

Review Comment:

Dear authors,

thank you for you detailed and helpfully clarifying responses to my review (review #2). The new submission of your paper is a great improvement. There are only minor points that I'd like to mention:

p1, abstract. "the case or Apertium RDF" -> "the case of Apertium RDF"

p2, column 1, line 41. "evaluation methods that are more reflective". Although I have a rough idea what you mean with the term reflective, there might eb a better term or a more detailed explanation, or simply remove that term from the abstract.

p2, column 2, Fig. 1. "Apertium RDF graph". What is shown here is nit an RDF graph, but instead a graphical visualization of language pairs within Apertium RDF graph and their interconnectedness. The same holds for the graphs shown in Fig. 3 and Fig. 4 - these are not RDF graphs.

p5, column 2, footnote 12. This is not a full sentence.

p12, column 2, lines 27 & 28: "While the in-production Apertium language pairs that are used in RDF". Maybe drop "that are used in RDF"?

p18, Fig. 6. One could remove the legend, as the labels occur below the plots. Also, one could remove the colors, as they do not provide additional information.

Log in or register to post comments
6280 reads

Main menu

Editorial Board

Syndicate

Bilingual dictionary generation and enrichment via graph exploration

Tracking #: 2899-4113

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Bilingual dictionary generation and enrichment via graph exploration

Tracking #: 2899-4113

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles