Word that is a link is repeated twice in translation with pivot language (alignments issue)
Categories
(Firefox :: Translations, defect, P3)
Tracking
()
People
(Reporter: marco, Unassigned)
References
(Blocks 1 open bug, )
Details
Attachments
(2 files)
When translating https://www.soeren-hentzschel.at/mozilla/mozilla-uebernimmt-pulse/ from German to Italian, "Mozilla-Dienst <a href='...'>Pocket</a>" is translated as "Mozilla <a href='...'>Pocket</a>Pocket".
I initially thought this bug should block bug 1836125, but then I noticed the error does not occur if you translate German to English.
It does happen if you translate German to French, or German to Italian, or German to Spanish.
Reporter | ||
Comment 1•2 years ago
|
||
The bug is reproducible with this HTML page.
Reporter | ||
Comment 2•2 years ago
|
||
The bug is not reproducible when translating the page_de.html page to English, saving it, opening it again, and then translating from English to Italian.
Comment 3•2 years ago
|
||
I suspect this is an issue with the markup matching in Bergamot, so it should probably block Bug 1836125.
Comment 4•2 years ago
|
||
I reproduce the issue in about:translations
with the pref browser.translations.useHTML
set to true
. This is on the Bergamot side and not the TranslationsDocument.
Comment 5•2 years ago
|
||
Oh I see now (after reading your comments closer) that if you manually do the translations without a pivot it works, and I can verify the behavior on about:translations
.
However, we are still just calling out to Bergamot here, so the bug still appears to be in Bergamot somehow.
responses = this.translationService.translateViaPivoting(
this.languageTranslationModels[0],
this.languageTranslationModels[1],
messages,
options
);
Here it is on the Bergamot side: https://github.com/browsermt/bergamot-translator/blob/eaa2562fe0b3b2bd9ac3424962ada33b7c3be2f1/wasm/bindings/service_bindings.cpp#L86-L93
Comment 6•2 years ago
|
||
The severity field is not set for this bug.
:nordzilla, could you have a look please?
For more information, please visit BugBot documentation.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 7•2 years ago
|
||
In bergamot the word alignments (mapping a source word to a target word) go through a pivot. This code live in Bergamot here:
Comment 8•2 years ago
•
|
||
I tried to create a minimal example, and I found it reproducing in a non-pivot translation.
English (source):
Focus the <span>Pocket</span> service.
German (target):
Konzentrieren Sie den <span></span>Pocket-Service.
Here is the alignment information:
0 | Focus |
the |
<span>Pocke |
t |
</span> service |
. |
<empty string> |
|
---|---|---|---|---|---|---|---|---|
1 | Kon | 0.996 | 0.000 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 |
2 | zen | 0.985 | 0.007 | 0.000 | 0.000 | 0.000 | 0.000 | 0.006 |
3 | tr | 0.990 | 0.002 | 0.000 | 0.000 | 0.000 | 0.000 | 0.006 |
4 | ieren | 0.965 | 0.032 | 0.000 | 0.000 | 0.000 | 0.000 | 0.001 |
5 | Sie | 0.856 | 0.081 | 0.015 | 0.002 | 0.018 | 0.011 | 0.013 |
6 | den | 0.085 | 0.880 | 0.023 | 0.002 | 0.005 | 0.001 | 0.001 |
7 | <span></span>Pocke |
0.008 | 0.001 | 0.912 | 0.012 | 0.063 | 0.000 | 0.001 |
8 | t | 0.000 | 0.000 | 0.008 | 0.989 | 0.001 | 0.000 | 0.000 |
9 | - | 0.007 | 0.060 | 0.059 | 0.211 | 0.558 | 0.039 | 0.063 |
10 | Service | 0.000 | 0.000 | 0.007 | 0.001 | 0.986 | 0.000 | 0.002 |
Comment 9•2 years ago
|
||
I'm wondering if this is exacerbated because of the remapAlignments
function https://github.com/browsermt/bergamot-translator/blob/534ed37a3d609f867a65c250328c5745b306a3c5/src/translator/response.cpp#L100-L143
Essentially this function maps the tokens from source -> pivot, pivot -> translation. In this case the probabilities will probably come out worse, especially as subword units don't map as cleanly. I think a better solution would be to map the pivots according to their word units, rather than subword tokens. The tokens are only needed for the Transformer architecture. We can and probably should do the markup manipulation at the word level.
Updated•7 months ago
|
Description
•