Open Bug 1302492 Opened 8 years ago Updated 2 years ago

The new Find Whole Word/ Find Exact String Option does not find Chinese words

Categories

(Toolkit :: Find Toolbar, defect)

defect

Tracking

()

People

(Reporter: jonathan_walden, Unassigned)

References

Details

(Keywords: intl)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160725105554

Steps to reproduce:

In nightly beta, use find in page with the new whole word option to search for the single character word 在 on the page https://zh.wikipedia.org/wiki/Wikipedia:%E9%A6%96%E9%A1%B5 


Actual results:

only one instance is found - " 在1960" - the one in which the word is surrounded by non-CJK characters.


Expected results:

many matches should have shown up, basically the same set as when whole word matching was not used.
If you try a similar search in other browsers that support the whole word option, they do find Chinese words even with the whole word option selected.

The current code appears to rely on a word break which determines breaks as changes in character class.  I suspect the current code does not work well for any language that does not separate words by spaces -- Thai, Chinese, Japanese.   There is some more info here about languages that do not use spaces  https://r12a.github.io/scripts/tutorial/part5
Blocks: 269422
Component: Untriaged → Find Toolbar
Keywords: intl
OS: Unspecified → Linux
Product: Firefox → Toolkit
Hardware: Unspecified → All
Version: 45 Branch → unspecified
"which only matches strings surrounded by word-breaking characters, like spaces or punctuation marks in latin-derived languages.", from bug 1282759.
Blocks: 269442
No longer blocks: 269422
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.