Closed Bug 969984 Opened 10 years ago Closed 10 years ago

Improve search for international users by ignoring accents

Categories

(Firefox :: Untriaged, defect)

26 Branch
x86_64
All
defect
Not set
minor

Tracking

()

RESOLVED DUPLICATE of bug 202251

People

(Reporter: redneb, Unassigned)

Details

(Whiteboard: [bugday-20140212])

Google Chrome implements a feature that makes searching for some text in a page significantly easier for some international users like myself: suppose that you would like to search for the string "née" but you don't have a keyboard layout that supports french accents. In Google Chrome, the string "née" will much against "nee".

There is no standard way to do search ignoring accents, but there is an algorithm that is widely used for that that works quite well (at least wrt western languages). The algorithm is based on a function called unaccent that takes a string s as its input and remove all accents as follows:
1. s:=NFD(s)
2. Remove from s all NonspacingMark characters
3. return NFC(s)
where NFD & NFC are standard unicode functions and NonspacingMark is a unicode character category. In the first step we separate accents from their base characters, then we remove the accents, and then we put what's left back together. For example, unaccent("ȁȂᾞçĢžᾧ") returns "aAΗcGzω".

To do now a string comparison that ignores accents, we just remove the accents and compare the two strings (see the overview section in [1]).

[1] http://userguide.icu-project.org/transforms/general
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Whiteboard: [bugday-20140212]
You need to log in before you can comment on or make changes to this bug.