Closed Bug 41425 Opened 24 years ago Closed 22 years ago

UTF-8 converter gets surrogates wrong

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: erik, Assigned: shanjian)

References

Details

(Keywords: intl, topembed)

Attachments

(1 file)

UTF-8 codes for UCS-4 characters outside UTF-16 are mistakenly converted to
surrogates in UTF-16. The test is currently:

  if(mUcs4 >= 0x001F0000) {

It should be:

  if(mUcs4 >= 0x00110000) {

Also, the surrogates are computed wrongly. Instead of ORing (|), it should be
ADDing (+). The current code is:

  *out++ = 0xD800 | (0x000003FF & (mUcs4 >> 10));
  *out++ = 0xDC00 | (0x000003FF & mUcs4);

It should be:

  *out++ = 0xD800 + (mUcs4 >> 10);
  *out++ = 0xDC00 + (0x000003FF & mUcs4);

The "out" pointer must not be incremented twice (as above) without checking
"outend". It should return NS_OK_UDEC_MOREOUTPUT and save state if we don't have
enough space in the output buffer.

For 4-byte UTF-8 chars, the line should be:

  mUcs4 = (mUcs4 << 18) & 0x001C0000L;

Near the end of the method, res is set to NS_OK_UDEC_MOREINPUT even if it had
been set to NS_ERROR_UNEXPECTED earlier in the code. It should not overwrite
res in this way.

"res" is initialized to "nsnull". This should be NS_OK.
Please ignore the part about ORing and ADDing. ORing is fine (and faster).
The "if(mUcs4 >= 0x001F0000)" bug occurs in several files in Mozilla. Maybe all
of these should be fixed as part of this bug.
Status: NEW → ASSIGNED
Target Milestone: --- → M17
This probably mean 3 places
/intl/uconv/ucvlatin/nsUTF8ToUnicode.cpp
/xpcom/ds/nsString2.cpp
/xpcom/ds/nsTextFormatter.cpp

The comment of changing 0x001F0000 to 0x00110000 is correct.
The comment of out++ twice should only apply to nsUTF8ToUnicode.cpp since we 
guarantee the buffer size in nsString2.cpp and nsTextFormatter.cpp
erik- can you help to fix it? Thanks Mark it as M23
Assignee: cata → erik
Status: ASSIGNED → NEW
Target Milestone: M17 → M23
Keywords: intl
Take this bug back. Mark it as future
Assignee: erik → ftang
Target Milestone: --- → Future
accept
Status: NEW → ASSIGNED
shanjian- can you take a look at this old bug. Is this still valid? if not ,
please close it. if yes, please file individual bugs for each cases. Thanks
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
Target Milestone: Future → ---
Status: NEW → ASSIGNED
/intl/uconv/ucvlatin/nsUTF8ToUnicode.cpp has been taken care of. 

/xpcom/ds/nsString2.cpp (It should be nsString2.h), will be taken care of in bug
134035. 

/xpcom/ds/nsTextFormatter.cpp will be taken care of by this bug. 



Attached patch patchSplinter Review
Comment on attachment 82258 [details] [diff] [review]
patch

r=ftang
Attachment #82258 - Flags: review+
blizzard, could you sr?
Comment on attachment 82258 [details] [diff] [review]
patch

sr=blizzard
Attachment #82258 - Flags: superreview+
fix checked into trunk. 
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
I suspect the checkin for this bug to have caused the regression in bug 164771.
Could you verify this?
Comment on attachment 82258 [details] [diff] [review]
patch

I just verified that his fix is causing bug 164771, as arthur suggested, which
is a serious copy/paste regression.
Attachment #82258 - Flags: needs-work+
Reopening this bug. The patch needs to be backed out, or augmented to fix the
copy/paste problem.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This is hard for me to understand why this simple change cause the problem. I
could not reproduce it on my box. When kind of text did you try to copy?
Status: REOPENED → ASSIGNED
Just verified with kin that the bug 164771 is not caused by this patch. Mark as
fixed. 
Status: ASSIGNED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Comment on attachment 82258 [details] [diff] [review]
patch

Ok, I screwed up on my backout of shanjian's changes. I tested the backout with
the wrong build. Backing out shanjian's patch did *not* fix the problem.
Appologies to shanjian.
Attachment #82258 - Flags: needs-work+
batch: adding topembed per Gecko2 document
http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
Keywords: topembed
Shanjian, how can I verify this?
Profile name resolution use this code. Try it and see if it works.
Blocks: grouper
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: