Open Bug 724529 Opened 8 years ago Updated 3 years ago

[tracking] use ICU to implement upcoming ECMAscript globalization APIs, and to replace existing Unicode/i18n utilities where feasible

Categories

(Core :: Internationalization, defect)

defect
Not set

Tracking

()

People

(Reporter: jfkthame, Unassigned)

References

(Depends on 4 open bugs, Blocks 1 open bug)

Details

(Keywords: meta)

Following some initial email discussions (snippets below), it looks like we want to consider using ICU within Gecko.

MSFT and GOOG are pushing the ECMAScript Globalization API hard and fast [....] What are we going to do? Shortest path may be port Google's impl and use ICU.

   http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts
   (Current Working Drafts links near top).

   http://blogs.msdn.com/b/ie/archive/2011/11/22/evolving-ecmascript.aspx
   http://html5labs.interoperabilitybridges.com/tc39_demos/JsGlobalization/
 
The Google implementation seems to be in v8, experimental:

   http://code.google.com/p/v8/source/search?q=DateTimeFormat&origq=DateTimeFormat&btnG=Search+Trunk


If we were to pull in ICU - which is probably the easiest route to implementation of the proposed JS-i18n APIs - then I suspect we could use it to replace quite a lot of the code currently in intl/ (things like charset conversion, case mapping, normalization, character-property utilities, etc); we've also got some code in gfx (e.g. script-run itemization) that could be replaced.

We'd probably want to tackle this on a project branch, so that we could take time to (a) import ICU, and figure out how to integrate it into the build; (b) transition to using ICU instead of our existing intl code wherever possible; (c) test that we're not regressing anything in the process; and (d) minimize the footprint by customizing the ICU build to omit features and data that we don't need. I doubt we'd be happy doing this work directly on m-c, as it seems likely to take a significant period, and we'd have a pretty bloated product during the process.

I'll file bugs for some specific steps and make them block this one; no doubt there will be additional ones to come, as we pin down the work items needed.
Depends on: 724531
On a side note, being able to use ICU would be great also to use its tokenizers for SQLite FTS. So far we were unable to have enough traction on that due to ICU data size.
Depends on: 724533
Depends on: 724534
Depends on: 724535
Depends on: 724538
Depends on: 724540
Depends on: 769871
Depends on: 769872
(In reply to Marco Bonardo [:mak] from comment #1)
> On a side note, being able to use ICU would be great also to use its
> tokenizers for SQLite FTS. So far we were unable to have enough traction on
> that due to ICU data size.

Yes. It's even more so considering that ICU ToT (soon to be ICU 50) now has the full support for word-break iterator for Chinese, Japanese, Korean as well as Thai and Khmer. CJK word-break iterator support has been in Chrome's copy of ICU for ages (since 2008), but it has not been upstreamed until this summer. Chrome has been using it for sqlite's tokenization. It's also used to implement " Range.expand('word')" in Webkit. Being able to segment CJK helps a dictionary extension to support Chinese and Japanese.
Keywords: meta
Depends on: es-intl
No longer depends on: 769872
Depends on: 851992
Blocks: 856115
Depends on: 924851
Depends on: 202251
Depends on: 556237
Depends on: 1225696
Depends on: 864753
Depends on: 1305700
Depends on: 728180
Depends on: 1308359
Depends on: 1301655
You need to log in before you can comment on or make changes to this bug.