Closed Bug 503986 Opened 15 years ago Closed 15 years ago

Errors in Unicode upper/lower case conversion

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 394604

People

(Reporter: dmandelin, Unassigned)

Details

Attachments

(1 file)

I just compared the results of JS toUpperCase and toLowerCase against Unicode 5.1.0. The attached file shows the 638 code points for which JS does something wrong.
Another thing I realized is that the char code compressed table format in jsstr.cpp cannot support correct Unicode case conversions. That format only allows a character to have an upper-case form or a lower-case form, but not both. But some characters (e.g., U+01C5 [1]) have both. I haven't analyzed those characters in detail but I think it might work to allow both bits set with a special interpretation of the offset area. I also notice 2 unused bits in the encoding that might be used to extend the offset area if needed. [1] http://www.fileformat.info/info/unicode/char/01c5/index.htm)
The jsstr.cpp table comes from Java 1 in 1995 -- Unicode 2, IIRC. There is a bug on upgrading it: bug 394604 I think. /be
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: