TranslationsDocument should use a word segmenter not a regex to support CJK-like languages
Categories
(Firefox :: Translations, enhancement, P3)
Tracking
()
People
(Reporter: gregtatum, Unassigned)
References
(Blocks 2 open bugs)
Details
In toolkit/components/translations/content/translations-document.sys.mjs
the TranslationsDocument
segments using a whitespace regex, which works decently well in Latin script languages, but doesn't scale to other languages, like CJK languages. In order to properly implement reporting words, we need to use the Intl.Segmenter, which at this time hasn't landed yet.
Reporter | ||
Updated•10 months ago
|
Reporter | ||
Comment 1•8 months ago
|
||
This is only in reportWordsInViewport
which we may just want to remove.
Reporter | ||
Comment 2•7 months ago
|
||
I suspect we should also use the browser's sentence segmenter as well, rather than ship one with Bergamot.
Comment 3•6 months ago
|
||
Hi All,
I want to work on this bug.
Does CJK languages mean Chinese/Japanese/Korean??
If yes, I see that these languages are not yet supported in Firefox browser. So how do we test it??
Also checking if this bug is still open, if all the information is up to date in this thread, and if someone is already working on it.
Cheers,
Meera
Outreachy
Reporter | ||
Comment 4•6 months ago
|
||
I'm sorry, but this one is not ready to be worked on yet. It is blocked by Bug 1423593.
Description
•