Closed
Bug 503986
Opened 15 years ago
Closed 15 years ago
Errors in Unicode upper/lower case conversion
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
DUPLICATE
of bug 394604
People
(Reporter: dmandelin, Unassigned)
Details
Attachments
(1 file)
33.14 KB,
text/plain
|
Details |
I just compared the results of JS toUpperCase and toLowerCase against Unicode 5.1.0. The attached file shows the 638 code points for which JS does something wrong.
Reporter | ||
Comment 1•15 years ago
|
||
Another thing I realized is that the char code compressed table format in jsstr.cpp cannot support correct Unicode case conversions. That format only allows a character to have an upper-case form or a lower-case form, but not both. But some characters (e.g., U+01C5 [1]) have both. I haven't analyzed those characters in detail but I think it might work to allow both bits set with a special interpretation of the offset area. I also notice 2 unused bits in the encoding that might be used to extend the offset area if needed.
[1] http://www.fileformat.info/info/unicode/char/01c5/index.htm)
Comment 2•15 years ago
|
||
The jsstr.cpp table comes from Java 1 in 1995 -- Unicode 2, IIRC. There is a bug on upgrading it: bug 394604 I think.
/be
Reporter | ||
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•