Open Bug 1908487 Opened 4 months ago Updated 4 months ago

Alignments fail to match nodes that are out of vocabulary, but are the same text

Categories

(Firefox :: Translations, defect)

defect

Tracking

()

People

(Reporter: gregtatum, Unassigned)

References

(Blocks 1 open bug)

Details

From the page: https://es.wikipedia.org/wiki/Pok%C3%A9mon

There is the sentence:

logrando ocupar el segundo lugar de las sagas de videojuegos más vendidos de <a href="...">Nintendo</a>

Is translated in full page translation as:

managing to rank second in Nintendo's best-selling video game sagas<a href="...">Nintendo.</a>

In about:translations with HTML capabilities turned on, it translates as:

managing to rank second in Nintendo's best-selling video game sagas<a href="..."></a>

I suspect that no alignments are found for the word "Nintendo" that is passed through on both definitions. The tokenization will be the same on the source and target for "Nintendo" as they share the SentencePiece tokenization.

When the source <a href="...">Nintendo</a> tag is not found in the target, it is appended to the end, for a somewhat non-sensical translation.

https://searchfox.org/mozilla-central/rev/8c6edfe25c094e032a27722ef30f69555f556bf8/toolkit/components/translations/content/translations-document.sys.mjs#1583

This is a bug in the Bergamot implementation.

Blocks: 1836125
No longer blocks: 1838721
You need to log in before you can comment on or make changes to this bug.