Closed Bug 1267120 Opened 8 years ago Closed 10 months ago

i18n: word/phrase detection for CJK?

Categories

(Core :: Internationalization, defect)

45 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1719535

People

(Reporter: jeffbai, Unassigned)

References

(Depends on 1 open bug)

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160412182503

Steps to reproduce:

1. Start Firefox;
2. Open up http://sports.sina.com.cn/china/j/2016-04-24/doc-ifxrpvcy4408748.shtml, a sports news report page in Chinese;
3. Double click randomly on the main text (if you are a Chinese-capable user/developer double click on phrases);


Actual results:

When double clicking on CJK text, it is expected that a phrase will be highlighted instead of a whole sentence - This works on both Chromium and Chrome, but not Firefox.

On Firefox, when double clicking on Chinese text on a webpage:

- Whole sentence;
- A segment of a sentence (e.g. from a comma to a period);
- A segment of a sentence from the first character to the first number or non-CJK character;

Will be highlighted.


Expected results:

When double clicking on CJK text, it is expected that a phrase will be highlighted instead of a whole sentence.
Just an update on that line-breaking statement in #1267118:

> A possible handling is use a word list much like what is in use
> for double-click word selection (unfortunately, in Chrome, not in 
> FF), and *prefer* these parts for breaking. 

After looking at my Twitter timeline, it appears that Chrome is actually using the said word-boundary-first strategy for line-breaking. Problems similar to "tag page" arise in other composite words like "基金会" ("fund organization" -> "foundation") still, but I guess these are good enough unless you want to use dedicated Chinese word splitters like jieba.
We don't have context based word breaker...
Component: Untriaged → Internationalization
Product: Firefox → Core
Severity: normal → S3

This is fixed by bug 1719535, which supports proper word boundaries for Chinese and Japanese.

Status: UNCONFIRMED → RESOLVED
Closed: 10 months ago
Duplicate of bug: 1719535
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.