Open Bug 724529 Opened 9 years ago Updated 4 years ago
[tracking] use ICU to implement upcoming ECMAscript globalization APIs, and to replace existing Unicode/i18n utilities where feasible
Following some initial email discussions (snippets below), it looks like we want to consider using ICU within Gecko. MSFT and GOOG are pushing the ECMAScript Globalization API hard and fast [....] What are we going to do? Shortest path may be port Google's impl and use ICU. http://wiki.ecmascript.org/doku.php?id=globalization:specification_drafts (Current Working Drafts links near top). http://blogs.msdn.com/b/ie/archive/2011/11/22/evolving-ecmascript.aspx http://html5labs.interoperabilitybridges.com/tc39_demos/JsGlobalization/ The Google implementation seems to be in v8, experimental: http://code.google.com/p/v8/source/search?q=DateTimeFormat&origq=DateTimeFormat&btnG=Search+Trunk If we were to pull in ICU - which is probably the easiest route to implementation of the proposed JS-i18n APIs - then I suspect we could use it to replace quite a lot of the code currently in intl/ (things like charset conversion, case mapping, normalization, character-property utilities, etc); we've also got some code in gfx (e.g. script-run itemization) that could be replaced. We'd probably want to tackle this on a project branch, so that we could take time to (a) import ICU, and figure out how to integrate it into the build; (b) transition to using ICU instead of our existing intl code wherever possible; (c) test that we're not regressing anything in the process; and (d) minimize the footprint by customizing the ICU build to omit features and data that we don't need. I doubt we'd be happy doing this work directly on m-c, as it seems likely to take a significant period, and we'd have a pretty bloated product during the process. I'll file bugs for some specific steps and make them block this one; no doubt there will be additional ones to come, as we pin down the work items needed.
On a side note, being able to use ICU would be great also to use its tokenizers for SQLite FTS. So far we were unable to have enough traction on that due to ICU data size.
(In reply to Marco Bonardo [:mak] from comment #1) > On a side note, being able to use ICU would be great also to use its > tokenizers for SQLite FTS. So far we were unable to have enough traction on > that due to ICU data size. Yes. It's even more so considering that ICU ToT (soon to be ICU 50) now has the full support for word-break iterator for Chinese, Japanese, Korean as well as Thai and Khmer. CJK word-break iterator support has been in Chrome's copy of ICU for ages (since 2008), but it has not been upstreamed until this summer. Chrome has been using it for sqlite's tokenization. It's also used to implement " Range.expand('word')" in Webkit. Being able to segment CJK helps a dictionary extension to support Chinese and Japanese.
You need to log in before you can comment on or make changes to this bug.