Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated aligned translation candidates #402

Open
jcuenod opened this issue Jun 4, 2024 · 1 comment
Open

Automated aligned translation candidates #402

jcuenod opened this issue Jun 4, 2024 · 1 comment
Labels
research Research topics

Comments

@jcuenod
Copy link

jcuenod commented Jun 4, 2024

I've just been working with a translation that got pretty low alignment scores on the source text. I researched the language a bit and managed to find related languages that performed better.

This made me wonder whether it's not worth building an alignment "index" so we can identify clusters, run alignments against samples in each cluster and find decent candidates automatically. Have you guys solved this problem in some other way or done something like this?

@ddaspit
Copy link
Collaborator

ddaspit commented Jun 4, 2024

You should check out the silnlp.alignment.visualize_similarity script. It computes the alignment scores for all project pairs in a country or language family. It then generates a hierarchical (dendrogram) or network graph based on the scores. It can also combine all of the scores by language, so that you can visualize the relationship between languages. It is intended to work on the biblical-humanities-corpus. This is a private repo that contains thousands of Bible translations. We could certainly extend it to support other clustering algorithms. Here is an example of the output:
india-language-tree

@ddaspit ddaspit added the research Research topics label Jun 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research Research topics
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants