[ja] New Japanese word Segmentation is anoying for me
Categories
(Core :: Internationalization, defect, P5)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr115 | --- | unaffected |
| firefox121 | --- | unaffected |
| firefox122 | --- | wontfix |
| firefox123 | --- | wontfix |
People
(Reporter: alice0775, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: jp-critical, nightly-community, regression)
New Japanese word Segmentation (dblclick selection and web search) is anoying.
Ex.
文芸評論(ぶんげいひょうろん、英語: literary criticism)とは、文学を評論すること。
文芸批評、または文学研究とも言うが、
小説家や作品に限らず文学とその周辺全般が扱われ、学際的な性格を持つ。
研究対象の性格によっては
夏目漱石、山田太郎
NEW word Segmentation:
/文芸/評論/(/ぶん/げ/い/ひょう/ろ/ん/、/英語/: /literary /criticism/)/と/は/、/
/文芸/批評/、/または/文学/研究/とも/言う/が/、/
/小説/家/や/作品/に/限/ら/ず文/学/とそ/の/周辺/全般/が/扱/われ/、/学際/的/な/性格/を/持つ/。/
/研究/対象/の/性格/によって/は/
/夏目/漱石/、/山田/太郎/
OLD word Segmentation:
/文芸評論/(/ぶんげいひょうろん/、/英語/: /literary /criticism/)/とは/、
/文芸批評/、/または/文学研究/とも/言/うが/、/
/小説家/や/作品/に/限/らず/文学/とその/周辺全般/が/扱われ/、/学際的/な/性格/を/持/つ/。/
/研究対象/の/性格/によっては/
/夏目漱石/、/山田太郎/
Expected:
文芸評論, 文芸批評, 文学研究, 小説家, 文学, 研究対象, 夏目漱石, 山田太郎 should be one word.
Comment 1•2 years ago
|
||
:m_kato, since you are the author of the regressor, bug 1854032, could you take a look? Also, could you set the severity field?
For more information, please visit BugBot documentation.
Comment 2•2 years ago
|
||
We will use machine learning based segmenter for CJ in the long-term future. Actually, this depends on ICU's dictionary for CJ. So no way to fix it now
Updated•2 years ago
|
Updated•2 years ago
|
Comment 3•2 years ago
|
||
FWIW, the new Firefox word Segmentation behavior is the same as Google Chrome or Safari.
Comment 4•1 year ago
•
|
||
I discussed this issue with Ting-Yu. We may be able to add an options to ignore dictionary in ICU4X or, WordBreakIteratorUtf16 has ignore option for Han script. I vote new option to WordBreakIteratorUtf16 since I don't want more dependencies in ICU4X segmenter.
Then we can change the behavior by pref in intl.properties. Localizer can choose it.
Comment 5•1 year ago
|
||
FWIW, the new Firefox word Segmentation behavior is the same as Google Chrome or Safari.
...
Localizer can choose it.
How would this interact with the Web-exposed Intl.Segmenter API and cross-browser consistency?
| Reporter | ||
Comment 6•1 year ago
•
|
||
ok. I think the wontfix is appropriate because of the following methods seem to suffice.
Double click and hold mouse down, then drag mouse to the right to select next text segment.
Description
•