Closed Bug 779068 Opened 12 years ago Closed 10 years ago

String normalisation (accents, special chars) for searching contacts

Categories

(Firefox OS Graveyard :: Gaia::Contacts, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: arcturus, Unassigned)

Details

(Whiteboard: Interaction design)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11

Steps to reproduce:

Searching for a contact shouldn't force the user to enter special chars or any accents.

Search for a contact:

Manuel Español

should be searched with both strings:

espanol
ñol




Actual results:

Right now we solved this using a custom solution:

https://github.com/arcturus/gaia/blob/69862ab5fd016a319ef023df26b488abc9cf4b05/apps/contacts/js/utilities/normalizer.js

Where we normalise the string to making it searchable.


Expected results:

The platform should provide a way of normalising the string, like any of the utf 8 normalization forms:

http://unicode.org/reports/tr15/
Component: General → Gaia::Contacts
Priority: -- → P3
Whiteboard: Interaction design
agreed. normalisation of the string would deliver a more comfortable and pragmatic UX.
I don't think we have something to help here in the js i18n spec, Norbert?
The current ECMAScript Internationalization API spec handles only one special case: A Collator with usage="search" can be used to detect that two strings are similar, and in that case accents can be ignored. However, that only works for two complete strings, not for substring matching.

There are actually two separate issues: Unicode normalization and language-specific matching of "similar" strings.

Unicode normalization, as specified in UTR 15, will be added in the ECMAScript Language Specification, edition 6:
http://wiki.ecmascript.org/doku.php?id=strawman:unicode_normalization

But Unicode normalization only erases differences that are linguistically irrelevant; it doesn't remove diacritics or change case, which often carry meaning in some languages. In Spanish, for example, pena and peña are different words with different meanings. In German, fliegen and Fliegen are different words.

What you then want is another layer to find "similar" strings, where similar depends on the language, the user's understanding of the language, the availability of input mechanisms for the language, and other circumstances. That kind of API isn't on the TC 39 agenda yet.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.