Open Bug 726888 Opened 13 years ago Updated 2 years ago

Move JS's Unicode character data tables and stuff into mfbt

Categories

(Core :: MFBT, defect)

defect

Tracking

()

People

(Reporter: Waldo, Unassigned)

References

Details

The JS engine uses various Unicode tables for stuff like uppercasing and lowercasing. This is stuff the browser could and should be using as well, and not reimplementing itself as I believe it does now. As a first step, the scripts, character data, headers, and so on should move into mfbt. To do this, we need to resolve a couple little issues. First, we need a library to stick the extern tables and stuff into. Bug 717540 should fix this very shortly. \o/ Second, we need to decide on a type to use for 16-bit characters/Unicode code units. This stuff currently uses jschar, which is fine for the JS engine but uncool for everyone else. char16_t from C11 and C++11 is the logical choice. But nothing's ever that easy. This will require some trickery to get working, seeing as in C++11 char16_t is a keyword, in C11 char16_t is a type defined by <uchar.h>, and all sorts of downrev compilers will support neither of these. (Although, <uchar.h> in C++11 is specified to not define char16_t, so it's possible that header will work in C++11 and C11 compilers both.) Funtimes. Anyway. Something to do sometime soon, although given everything got updated to Unicode 6.1 just now, it'll probably be a little while before this move (and subsequent consolidation of reimplemented details) is too helpful.
Of course, we could just use uint16_t for now, too, and assume that we can rewrite our way to char16_t when it becomes feasible to do so.
Depends on: 796948
I am not sure how important that is, but at the moment the engine does not yet support special casing Bug 672042. Oh and JavaScript unicode is kind of strange, I am not sure it really matches with what the browsers needs.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.