Open Bug 1844096 Opened 1 year ago Updated 2 months ago

Word that is a link is repeated twice in translation with pivot language (alignments issue)

Categories

(Firefox :: Translations, defect, P3)

defect

Tracking

()

People

(Reporter: marco, Unassigned)

References

(Blocks 1 open bug, )

Details

Attachments

(2 files)

When translating https://www.soeren-hentzschel.at/mozilla/mozilla-uebernimmt-pulse/ from German to Italian, "Mozilla-Dienst <a href='...'>Pocket</a>" is translated as "Mozilla <a href='...'>Pocket</a>Pocket".

I initially thought this bug should block bug 1836125, but then I noticed the error does not occur if you translate German to English.
It does happen if you translate German to French, or German to Italian, or German to Spanish.

Attached file page_de.html

The bug is reproducible with this HTML page.

Attached file page_en.html

The bug is not reproducible when translating the page_de.html page to English, saving it, opening it again, and then translating from English to Italian.

I suspect this is an issue with the markup matching in Bergamot, so it should probably block Bug 1836125.

Blocks: 1836125

I reproduce the issue in about:translations with the pref browser.translations.useHTML set to true. This is on the Bergamot side and not the TranslationsDocument.

Oh I see now (after reading your comments closer) that if you manually do the translations without a pivot it works, and I can verify the behavior on about:translations.

However, we are still just calling out to Bergamot here, so the bug still appears to be in Bergamot somehow.

        responses = this.translationService.translateViaPivoting(
          this.languageTranslationModels[0],
          this.languageTranslationModels[1],
          messages,
          options
        );

https://searchfox.org/mozilla-central/rev/d2a61d9c63beb3ac4b134767e43a7a8c1b91d5cd/toolkit/components/translations/content/translations-engine-worker.js#363-368

Here it is on the Bergamot side: https://github.com/browsermt/bergamot-translator/blob/eaa2562fe0b3b2bd9ac3424962ada33b7c3be2f1/wasm/bindings/service_bindings.cpp#L86-L93

No longer blocks: 1838721

The severity field is not set for this bug.
:nordzilla, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(enordin)
Severity: -- → S3
Priority: -- → P3
Flags: needinfo?(enordin)

In bergamot the word alignments (mapping a source word to a target word) go through a pivot. This code live in Bergamot here:

https://github.com/mozilla/bergamot-translator/blob/5ae1b1ebb3fa9a3eabed8a64ca6798154bd486eb/src/translator/response.cpp#L100-L143

I tried to create a minimal example, and I found it reproducing in a non-pivot translation.

English (source):

Focus the <span>Pocket</span> service.

German (target):

Konzentrieren Sie den <span></span>Pocket-Service.

Here is the alignment information:

0 Focus the <span>Pocke t </span> service . <empty string>
1 Kon 0.996 0.000 0.002 0.000 0.000 0.000 0.000
2 zen 0.985 0.007 0.000 0.000 0.000 0.000 0.006
3 tr 0.990 0.002 0.000 0.000 0.000 0.000 0.006
4 ieren 0.965 0.032 0.000 0.000 0.000 0.000 0.001
5 Sie 0.856 0.081 0.015 0.002 0.018 0.011 0.013
6 den 0.085 0.880 0.023 0.002 0.005 0.001 0.001
7 <span></span>Pocke 0.008 0.001 0.912 0.012 0.063 0.000 0.001
8 t 0.000 0.000 0.008 0.989 0.001 0.000 0.000
9 - 0.007 0.060 0.059 0.211 0.558 0.039 0.063
10 Service 0.000 0.000 0.007 0.001 0.986 0.000 0.002

I'm wondering if this is exacerbated because of the remapAlignments function https://github.com/browsermt/bergamot-translator/blob/534ed37a3d609f867a65c250328c5745b306a3c5/src/translator/response.cpp#L100-L143

Essentially this function maps the tokens from source -> pivot, pivot -> translation. In this case the probabilities will probably come out worse, especially as subword units don't map as cleanly. I think a better solution would be to map the pivots according to their word units, rather than subword tokens. The tokens are only needed for the Transformer architecture. We can and probably should do the markup manipulation at the word level.

Summary: Word that is a link is repeated twice in translation with pivot language → Word that is a link is repeated twice in translation with pivot language (alignments issue)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: