TranslationsDocument should use a word segmenter not a regex to support CJK-like languages
Categories
(Firefox :: Translations, enhancement, P3)
Tracking
()
People
(Reporter: gregtatum, Unassigned)
References
Details
In toolkit/components/translations/content/translations-document.sys.mjs the TranslationsDocument segments using a whitespace regex, which works decently well in Latin script languages, but doesn't scale to other languages, like CJK languages. In order to properly implement reporting words, we need to use the Intl.Segmenter, which at this time hasn't landed yet.
| Reporter | ||
Updated•2 years ago
|
| Reporter | ||
Comment 1•2 years ago
|
||
This is only in reportWordsInViewport which we may just want to remove.
| Reporter | ||
Comment 2•2 years ago
|
||
I suspect we should also use the browser's sentence segmenter as well, rather than ship one with Bergamot.
Comment 3•2 years ago
|
||
Hi All,
I want to work on this bug.
Does CJK languages mean Chinese/Japanese/Korean??
If yes, I see that these languages are not yet supported in Firefox browser. So how do we test it??
Also checking if this bug is still open, if all the information is up to date in this thread, and if someone is already working on it.
Cheers,
Meera
Outreachy
| Reporter | ||
Comment 4•2 years ago
|
||
I'm sorry, but this one is not ready to be worked on yet. It is blocked by Bug 1423593.
| Reporter | ||
Comment 5•1 year ago
|
||
I'm going to mark this as invalid, as this information is only being reported to the console as a debug message. The only time it is important to properly segment is if we reported this to telemetry. I filed Bug 1904418 for that work, and I will mark this one as invalid.
Description
•