Closed Bug 347867 Opened 19 years ago Closed 19 years ago

Astral plane numeric character entitities misdetected as surrogates

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: smontagu, Assigned: smontagu)

References

()

Details

Attachments

(1 file)

The tests for surrogate characters at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/string/public/nsCharTraits.h&rev=1.35&mark=79,81,83#75 are fine for UTF-16 input, but not for UTF-32, which can get passed to them via ENSURE_VALID_CHAR
Attached patch PatchSplinter Review
Attachment #232738 - Flags: superreview?
Attachment #232738 - Flags: review?
Attachment #232738 - Flags: superreview?(darin)
Attachment #232738 - Flags: superreview?
Attachment #232738 - Flags: review?(darin)
Attachment #232738 - Flags: review?
Hmm... aren't these macros explicitly for the xpcom/string code, which does not itself deal with UTF-32? Are we confusing matters a bit if we have code depending on xpcom/string for UTF-32 stuff when xpcom/string does not mention support for UTF-32? What if someone mucks with xpcom/string down the road and breaks UTF-32 support unknowingly?
I shouldn't have said "UTF-32". The issue is not strings in UTF-32, but 32-bit codepoints that are being converted to a UTF-16 surrogate pair.
cc-ing darin in case he missed the last couple of comments
Comment on attachment 232738 [details] [diff] [review] Patch >Index: xpcom/string/public/nsCharTraits.h > // Some macros for working with PRUnichar > #define PLANE1_BASE PRUint32(0x00010000) > // High surrogates are in the range 0xD800 -- OxDBFF >-#define IS_HIGH_SURROGATE(u) ((PRUnichar(u) & 0xFC00) == 0xD800) >+#define IS_HIGH_SURROGATE(u) ((PRUint32(u) & 0xFFFFFC00) == 0xD800) > // Low surrogates are in the range 0xDC00 -- 0xDFFF >-#define IS_LOW_SURROGATE(u) ((PRUnichar(u) & 0xFC00) == 0xDC00) >+#define IS_LOW_SURROGATE(u) ((PRUint32(u) & 0xFFFFFC00) == 0xDC00) > // Faster than testing IS_HIGH_SURROGATE || IS_LOW_SURROGATE >-#define IS_SURROGATE(u) ((PRUnichar(u) & 0xF800) == 0xD800) >+#define IS_SURROGATE(u) ((PRUint32(u) & 0xFFFFF800) == 0xD800) I think the comment should at least be modified to refer to larger than PRUnichar code points. My point was not that this code isn't fine, but rather that there isn't much indication that xpcom/string code should work with 32-bit code points. I think some comments are warranted to inform the reader that someone cares about these macros being used with larger than 16-bit code points. r+sr=darin with such a change
Attachment #232738 - Flags: superreview?(darin)
Attachment #232738 - Flags: superreview+
Attachment #232738 - Flags: review?(darin)
Attachment #232738 - Flags: review+
I'm still not sure what the problem is, since PRUnichars are by definition 16-bit codepoints that encode 32-bit Unicode scalar values, but I'll add some comments to make that clear.
Checked in with added comments
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: