Open Bug 1869732 Opened 2 years ago Updated 1 year ago

Get a rid of legacy line/word segmenter

Categories

(Core :: Internationalization, task, P3)

task

Tracking

()

People

(Reporter: m_kato, Unassigned)

References

(Blocks 1 open bug)

Details

After ICU4X's segmenter as default on release channel, we can remove it with prefs.

Also, we need to adjust or remove old segmenter tests such as the following.

	layout/reftests/line-breaking/reftest.list
2	pref(intl.icu4x.segmenter.enabled,false) == chemical-1.html chemical-1-ref.html
3	pref(intl.icu4x.segmenter.enabled,false) == conservative-range-1.html conservative-range-1-ref.html
4	pref(intl.icu4x.segmenter.enabled,false) == conservative-range-2.html conservative-range-2-ref.html
5	pref(intl.icu4x.segmenter.enabled,false) == currency-1.html currency-1-ref.html
7	pref(intl.icu4x.segmenter.enabled,false) == datetime-1.html datetime-1-ref.html
9	pref(gfx.font_rendering.fallback.async,false) pref(intl.icu4x.segmenter.enabled,false) == emoji-2.html emoji-2-ref.html
10	pref(intl.icu4x.segmenter.enabled,false) == hyphens-1.html hyphens-1-ref.html
11	pref(intl.icu4x.segmenter.enabled,false) == hyphens-2.html hyphens-2-ref.html
19	pref(intl.icu4x.segmenter.enabled,false) == leaders-1.html leaders-1-ref.html
20	pref(intl.icu4x.segmenter.enabled,false) == markup-src-1.html markup-src-1-ref.html
24	pref(intl.icu4x.segmenter.enabled,false) == parentheses-1.html parentheses-1-ref.html
29	pref(intl.icu4x.segmenter.enabled,false) == quotationmarks-1.html quotationmarks-1-ref.html
32	pref(intl.icu4x.segmenter.enabled,false) skip-if(gtkWidget) == quotationmarks-cjk-1.html quotationmarks-cjk-1-ref.html
33	pref(intl.icu4x.segmenter.enabled,false) == smileys-1.html smileys-1-ref.html
34	pref(intl.icu4x.segmenter.enabled,false) == smileys-2.html smileys-2-ref.html
38	pref(intl.icu4x.segmenter.enabled,false) == surrogates-2.html surrogates-2-ref.html
40	pref(intl.icu4x.segmenter.enabled,false) == surrogates-4.html surrogates-4-ref.html
41	pref(intl.icu4x.segmenter.enabled,false) == url-1.html url-1-ref.html
42	pref(intl.icu4x.segmenter.enabled,false) == url-2.html url-2-ref.html
43	pref(intl.icu4x.segmenter.enabled,false) == url-3.html url-3-ref.html
44	pref(intl.icu4x.segmenter.enabled,false) == winpath-1.html winpath-1-ref.html
	layout/reftests/text/reftest.list
76	pref(intl.icu4x.segmenter.enabled,false) == wordbreak-1.html wordbreak-1-ref.html
151	pref(intl.icu4x.segmenter.enabled,false) == 1507661-spurious-hyphenation-after-explicit.html 1507661-spurious-hyphenation-after-explicit-ref.html
337	pref(intl.icu4x.segmenter.enabled,false) == ethiopic-wordspace.html ethiopic-wordspace-ref.html
Depends on: 1871754
No longer depends on: 1871754

Given that we are receiving complaining on Japanese word selection with ICU4X segmenter such as bug 1871754 and https://support.mozilla.org/en-US/forums/contributors/716759?last=86984#post-86974, we might want to keep the pref for a while so that people can switch to the old behavior.

Blocks: segmenter

(In reply to Ting-Yu Lin [:TYLin] (UTC-8) (Away Feb 15 - Mar 2) from comment #2)

Given that we are receiving complaining on Japanese word selection with ICU4X segmenter such as bug 1871754 and https://support.mozilla.org/en-US/forums/contributors/716759?last=86984#post-86974, we might want to keep the pref for a while so that people can switch to the old behavior.

I guess that legacy line segmenter may be able to removed since it doesn't depends on bug 1871754 issue. Also, word segmenter depends on legacy complex breaker for East Asian language. So we can replace it with ICU4X's LSTM segmenter. Legacy complex breaker runs on parent process only on Windows due to win32k lockdown.

Should I file a new bug to remove legacy line segmenter only?

Re comment 3:

Also, word segmenter depends on legacy complex breaker for East Asian language. So we can replace it with ICU4X's LSTM segmenter. Legacy complex breaker runs on parent process only on Windows due to win32k lockdown.

For word breaker, if some Japanese users want the legacy word breaking behavior, does it help if we mimic the legacy behavior with icu4x segmenter with no dictionaries?

Re comment 4:

Should I file a new bug to remove legacy line segmenter only?

It is OK to remove legacy line segmenter and word segmenter separately. However, we have some line breaking compat issues such as bug 1848049 and bug 1876874. It is not clear to me if people are already setting intl.icu4x.segmenter.enabled=false to opt-in to the legacy behavior. If so, we might want to fix these bugs before removing the legacy line segmenter.

Depends on: 1899444
You need to log in before you can comment on or make changes to this bug.