Closed Bug 503986 Opened 15 years ago Closed 15 years ago

Errors in Unicode upper/lower case conversion

Tracking

()

Status:

RESOLVED DUPLICATE of bug 394604

People

(Reporter: dmandelin, Unassigned)

Details

Attachments

(1 file)

Incorrect case conversions 15 years ago David Mandelin [:dmandelin] 33.14 KB, text/plain		Details

David Mandelin [:dmandelin]

Reporter

Description

•

15 years ago

Attached file Incorrect case conversions — Details

I just compared the results of JS toUpperCase and toLowerCase against Unicode 5.1.0. The attached file shows the 638 code points for which JS does something wrong.

David Mandelin [:dmandelin]

Reporter

Comment 1

•

15 years ago

Another thing I realized is that the char code compressed table format in jsstr.cpp cannot support correct Unicode case conversions. That format only allows a character to have an upper-case form or a lower-case form, but not both. But some characters (e.g., U+01C5 [1]) have both. I haven't analyzed those characters in detail but I think it might work to allow both bits set with a special interpretation of the offset area. I also notice 2 unused bits in the encoding that might be used to extend the offset area if needed. [1] http://www.fileformat.info/info/unicode/char/01c5/index.htm)

Brendan Eich [:brendan]

Comment 2

•

15 years ago

The jsstr.cpp table comes from Java 1 in 1995 -- Unicode 2, IIRC. There is a bug on upgrading it: bug 394604 I think. /be

David Mandelin [:dmandelin]

Reporter

Updated

•

15 years ago

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Errors in Unicode upper/lower case conversion

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: dmandelin, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Updated

Attachment

General

Description

File Name

Content Type