Open Bug 1820618 Opened 1 year ago Updated 6 months ago

TranslationsDocument should use a word segmenter not a regex to support CJK-like languages

Categories

(Firefox :: Translations, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: gregtatum, Unassigned)

References

(Blocks 2 open bugs)

Details

In toolkit/components/translations/content/translations-document.sys.mjs the TranslationsDocument segments using a whitespace regex, which works decently well in Latin script languages, but doesn't scale to other languages, like CJK languages. In order to properly implement reporting words, we need to use the Intl.Segmenter, which at this time hasn't landed yet.

Depends on: 1423593
Blocks: 1837421
Blocks: 1838721
No longer blocks: fx-translation

This is only in reportWordsInViewport which we may just want to remove.

I suspect we should also use the browser's sentence segmenter as well, rather than ship one with Bergamot.

Hi All,

    I want to work on this bug. 

Does CJK languages mean Chinese/Japanese/Korean??
If yes, I see that these languages are not yet supported in Firefox browser. So how do we test it??
Also checking if this bug is still open, if all the information is up to date in this thread, and if someone is already working on it.

Cheers,
Meera
Outreachy

Flags: needinfo?(gtatum)

I'm sorry, but this one is not ready to be worked on yet. It is blocked by Bug 1423593.

Flags: needinfo?(gtatum)
You need to log in before you can comment on or make changes to this bug.