Closed
Bug 347867
Opened 19 years ago
Closed 19 years ago
Astral plane numeric character entitities misdetected as surrogates
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: smontagu, Assigned: smontagu)
References
()
Details
Attachments
(1 file)
|
1.51 KB,
patch
|
darin.moz
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
The tests for surrogate characters at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/string/public/nsCharTraits.h&rev=1.35&mark=79,81,83#75 are fine for UTF-16 input, but not for UTF-32, which can get passed to them via ENSURE_VALID_CHAR
| Assignee | ||
Comment 1•19 years ago
|
||
Attachment #232738 -
Flags: superreview?
Attachment #232738 -
Flags: review?
| Assignee | ||
Updated•19 years ago
|
Attachment #232738 -
Flags: superreview?(darin)
Attachment #232738 -
Flags: superreview?
Attachment #232738 -
Flags: review?(darin)
Attachment #232738 -
Flags: review?
Comment 2•19 years ago
|
||
Hmm... aren't these macros explicitly for the xpcom/string code, which does not itself deal with UTF-32? Are we confusing matters a bit if we have code depending on xpcom/string for UTF-32 stuff when xpcom/string does not mention support for UTF-32? What if someone mucks with xpcom/string down the road and breaks UTF-32 support unknowingly?
| Assignee | ||
Comment 3•19 years ago
|
||
I shouldn't have said "UTF-32". The issue is not strings in UTF-32, but 32-bit codepoints that are being converted to a UTF-16 surrogate pair.
This patch seems fine to me.
| Assignee | ||
Comment 5•19 years ago
|
||
cc-ing darin in case he missed the last couple of comments
Comment 6•19 years ago
|
||
Comment on attachment 232738 [details] [diff] [review]
Patch
>Index: xpcom/string/public/nsCharTraits.h
> // Some macros for working with PRUnichar
> #define PLANE1_BASE PRUint32(0x00010000)
> // High surrogates are in the range 0xD800 -- OxDBFF
>-#define IS_HIGH_SURROGATE(u) ((PRUnichar(u) & 0xFC00) == 0xD800)
>+#define IS_HIGH_SURROGATE(u) ((PRUint32(u) & 0xFFFFFC00) == 0xD800)
> // Low surrogates are in the range 0xDC00 -- 0xDFFF
>-#define IS_LOW_SURROGATE(u) ((PRUnichar(u) & 0xFC00) == 0xDC00)
>+#define IS_LOW_SURROGATE(u) ((PRUint32(u) & 0xFFFFFC00) == 0xDC00)
> // Faster than testing IS_HIGH_SURROGATE || IS_LOW_SURROGATE
>-#define IS_SURROGATE(u) ((PRUnichar(u) & 0xF800) == 0xD800)
>+#define IS_SURROGATE(u) ((PRUint32(u) & 0xFFFFF800) == 0xD800)
I think the comment should at least be modified to refer to
larger than PRUnichar code points.
My point was not that this code isn't fine, but rather that
there isn't much indication that xpcom/string code should
work with 32-bit code points. I think some comments are
warranted to inform the reader that someone cares about these
macros being used with larger than 16-bit code points.
r+sr=darin with such a change
Attachment #232738 -
Flags: superreview?(darin)
Attachment #232738 -
Flags: superreview+
Attachment #232738 -
Flags: review?(darin)
Attachment #232738 -
Flags: review+
| Assignee | ||
Comment 7•19 years ago
|
||
I'm still not sure what the problem is, since PRUnichars are by definition 16-bit codepoints that encode 32-bit Unicode scalar values, but I'll add some comments to make that clear.
| Assignee | ||
Comment 8•19 years ago
|
||
Checked in with added comments
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•