Closed Bug 1820618 Opened 2 years ago Closed 1 year ago

TranslationsDocument should use a word segmenter not a regex to support CJK-like languages

Categories

(Firefox :: Translations, enhancement, P3)

enhancement

Tracking

()

RESOLVED INVALID

People

(Reporter: gregtatum, Unassigned)

References

Details

In toolkit/components/translations/content/translations-document.sys.mjs the TranslationsDocument segments using a whitespace regex, which works decently well in Latin script languages, but doesn't scale to other languages, like CJK languages. In order to properly implement reporting words, we need to use the Intl.Segmenter, which at this time hasn't landed yet.

Depends on: 1423593
Blocks: 1837421
No longer blocks: fx-translation

This is only in reportWordsInViewport which we may just want to remove.

I suspect we should also use the browser's sentence segmenter as well, rather than ship one with Bergamot.

Hi All,

    I want to work on this bug. 

Does CJK languages mean Chinese/Japanese/Korean??
If yes, I see that these languages are not yet supported in Firefox browser. So how do we test it??
Also checking if this bug is still open, if all the information is up to date in this thread, and if someone is already working on it.

Cheers,
Meera
Outreachy

Flags: needinfo?(gtatum)

I'm sorry, but this one is not ready to be worked on yet. It is blocked by Bug 1423593.

Flags: needinfo?(gtatum)

I'm going to mark this as invalid, as this information is only being reported to the console as a debug message. The only time it is important to properly segment is if we reported this to telemetry. I filed Bug 1904418 for that work, and I will mark this one as invalid.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INVALID
See Also: → 1904418
No longer blocks: 1838721
No longer blocks: 1904415
You need to log in before you can comment on or make changes to this bug.