Closed Bug 969980 Opened 10 years ago Closed 5 years ago

Implement case-insensitive search correctly using case-folding

Categories

(Core :: Find Backend, defect)

26 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox72 --- fixed

People

(Reporter: redneb, Assigned: alexhenrie24)

Details

(Whiteboard: [bugday-20140212])

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0 (Beta/Release)
Build ID: 20140106153735

Steps to reproduce:

Firefox does implement case-insensitive search but the implementation is wrong. It appears that firefox converts all text to lower case and then does ordinary search. While this does work correctly for most cases, it fails for others. The correct way to do that is to use the case-folding function which has been designed exactly for that purpose. In other words, if you want to determine if two strings s1 and s2 are the same in a case-insensitive way, then it is wrong to do it like so: toLower(s1)==toLower(s2). The correct way is caseFold(s1)==caseFold(s2).



Actual results:

In a page that contains the letter "σ" (U+03C3 GREEK SMALL LETTER SIGMA), I hit Ctrl+F and type the letter "ς" (U+03C2 GREEK SMALL LETTER FINAL SIGMA). Firefox fails to find the "σ" in the page. This is because toLower("σ")!=toLower("ς").


Expected results:

Searching for "ς" should also find all σ's. Google Chrome does that. Note that
caseFold("σ")==caseFold("ς").
OS: Linux → All
Hardware: x86_64 → All
bug 812837 or bug 202251?
Component: Untriaged → Find Backend
Product: Firefox → Core
Whiteboard: [bugday-20140212]
This is definitely not related to bug 812837. The problem with Turkic i's is a separate issue.

I think that this is also not related to bug 202251; in that bug, the discussion centers around accents and diacritics. What I am talking about here is good old case insensitive search. Unlike the removing accents of accents, this should not be a contentious issue: if we want case insensitive search, we have to do it right, i.e. by using case folding instead of lowercasing.

The patch I just uploaded adds a ToFoldedCase function and uses it instead of ToLowerCase, which resolves the problem with the letter sigma. Now, I realize that ToFoldedCase would have to return a string instead of a single char in order to support characters that become two characters when uppercased (for example, spaß = SPASS in German). However, I decided to not bother with that because I don't think we currently have a way to add that kind of complexity to nsFind without killing performance. I did put ToFoldedCase in the "internal API" section so that it can be changed in the future to return a string.

Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cfcbab38d04c
Use case folding for case-insensitive searches. r=Ehsan
Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Assignee: nobody → alexhenrie24
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: