Alignments fail to match nodes that are out of vocabulary, but are the same text
Categories
(Firefox :: Translations, defect)
Tracking
()
People
(Reporter: gregtatum, Unassigned)
References
(Blocks 1 open bug)
Details
From the page: https://es.wikipedia.org/wiki/Pok%C3%A9mon
There is the sentence:
logrando ocupar el segundo lugar de las sagas de videojuegos más vendidos de <a href="...">Nintendo</a>
Is translated in full page translation as:
managing to rank second in Nintendo's best-selling video game sagas<a href="...">Nintendo.</a>
In about:translations with HTML capabilities turned on, it translates as:
managing to rank second in Nintendo's best-selling video game sagas<a href="..."></a>
I suspect that no alignments are found for the word "Nintendo" that is passed through on both definitions. The tokenization will be the same on the source and target for "Nintendo" as they share the SentencePiece tokenization.
When the source <a href="...">Nintendo</a>
tag is not found in the target, it is appended to the end, for a somewhat non-sensical translation.
This is a bug in the Bergamot implementation.
Description
•