Search by words rather than characters for Chinese Traditional in "Search Language" box

VERIFIED FIXED

Status

VERIFIED FIXED
8 years ago
6 years ago

People

(Reporter: petercpg, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

User-Agent:       Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b8) Gecko/20100101 Firefox/4.0b8
Build Identifier: 

When searching in the Search Language box, if users are searching for an alphabetic language, it's correct to type letter-by-letter to get the result.

In Chinese, both Traditional and Simplified are called "中文" for short and if users want to find Chinese (Traditional), he/she might just type in "中文" since the word "正體"(Traditional) is only the adjective and would be discarded as in their mind as a keyword.

This may also affect other non-alphabetic languages.

Reproducible: Always

Steps to Reproduce:
1. Open http://www.mozilla.com/en-US/firefox/all.html
2. Type in "中文" in "Search Language" box
Actual Results:  
Returns only Chinese (Simplified)/中文 (简体)

Expected Results:  
Returns both Chinese (Simplified)/中文 (简体) and Chinese (Traditional)/正體中文 (繁體)
The problem lies at line 106 of
https://www.mozilla.com/js/mozilla-language-search.js
on how regular expression object is constructed.

new RegExp('(?:^' + keywords + '|[ (]' + keywords + ')', 'i');

this would only match keywords followed by space, left parenthesis, and beginning of the string. While this works in scripts written in alphabets, it fails on scripts w/o word determiners (e.g. Eastern Asian scripts)

I also have no idea how to solve the problem other than rewritten the search function. Should we do it?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Created attachment 503754 [details] [diff] [review]
Patch to exclude word determiner when search with CJK characters

Made the patch, it will switch to applicable search regexp by probing the characters in keywords.
Attachment #503754 - Flags: review?
Comment on attachment 503754 [details] [diff] [review]
Patch to exclude word determiner when search with CJK characters

I'm going through unassigned review requests in bugzilla, looks like this one has slipped through the cracks and has been accidentally ignored for the past... uhhh... year.

morgamic, can you please review or reassign as appropriate? Thanks.
Attachment #503754 - Flags: review? → review?(morgamic)
(Assignee)

Updated

7 years ago
Component: www.mozilla.org/firefox → www.mozilla.org
Product: Websites → Websites
Comment on attachment 503754 [details] [diff] [review]
Patch to exclude word determiner when search with CJK characters

Anthony - can you review this?
Attachment #503754 - Flags: review?(morgamic) → review?(anthony)
Comment on attachment 503754 [details] [diff] [review]
Patch to exclude word determiner when search with CJK characters

Review of attachment 503754 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for this.

Sorry for only getting to it now, I missed the first email. Bugzilla reminders are useful :)
Attachment #503754 - Flags: review?(anthony) → review+
Fixed with r105333.
Pushed with r105334.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
verified fixed . Searching with 中文 returns Returns both Chinese (Simplified)/中文 (简体) and Chinese (Traditional)/正體中文 (繁體)  https://www.mozilla.org/en-US/firefox/all.html
Status: RESOLVED → VERIFIED
Thanks. Although looking at that one-year-old patch, I would say this might be fixed as simple as just |var search = new RegExp('\b' + keywords + '\b', 'i');| ...:)
(Assignee)

Updated

6 years ago
Component: www.mozilla.org → General
Product: Websites → www.mozilla.org
You need to log in before you can comment on or make changes to this bug.