Closed
Bug 469463
Opened 16 years ago
Closed 16 years ago
Firefox 3 does not properly display surrogate characters
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: sander.peschier, Assigned: smontagu)
References
Details
Attachments
(1 file)
161 bytes,
text/html
|
Details |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; nl; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4 I am working with chinese characters that have an unicode value > #ffff. On a website i have programmed myself, i am using two ways to display these characters: 1. using a html entity, for instance: 𨳌 2. using a server-side function that converts the character into a html encoded string, for instance: Server.HTMLEncode("
Reporter | ||
Comment 1•16 years ago
|
||
It seems that something has gone wrong with saving the bug report. This should be added: ==== Server.HTMLEncode("<some chinese character >#ffff>"). The result of this function is a surrogate pair: �� Both approaches should work (and the first one does), but with the second one i get . Both approaches used to work fine with Firefox 2. ==== To reduplicate the bug, use this code: === <html> <body> Both methods should produce the same character, but they do not:<br> Method 1: 𨳌<br> Method 2: ��<br> </body> </html> ====
Updated•16 years ago
|
Assignee: nobody → smontagu
Component: General → Internationalization
Product: Firefox → Core
QA Contact: general → i18n
Comment 2•16 years ago
|
||
Confirmed using: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.4) Gecko/2008102920 Firefox/3.0.4 and Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2a1pre) Gecko/20081206 Minefield/3.2a1pre
Updated•16 years ago
|
OS: Windows XP → All
Assignee | ||
Comment 3•16 years ago
|
||
This bug is INVALID: � and � are legal UTF-16 code points, but not legal Unicode scalar values, and so are illegal values for NCRs. It's true that they worked in Firefox 2, but that was a bug which has since been fixed (bug 316394).
Blocks: 316394
Assignee | ||
Updated•16 years ago
|
Status: UNCONFIRMED → RESOLVED
Closed: 16 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 4•16 years ago
|
||
I don't think you should take � and � as two seperate characters. The first is a so called High-Surrogate Code Unit. From the Unicode website: ==== High-Surrogate Code Unit. A 16-bit code unit in the range D800 to DBFF, used in UTF-16 as the leading code unit of a surrogate pair. Also known as a leading surrogate. (See definition D72 in Section 3.8, Surrogates.) ==== The 2 codepoints form a pair so Unicode scalar values greater then 0xFFFF can be used. 55395: D863(In reply to comment #3) > This bug is INVALID: � and � are legal UTF-16 code points, but > not legal Unicode scalar values, and so are illegal values for NCRs. It's true > that they worked in Firefox 2, but that was a bug which has since been fixed > (bug 316394).
Assignee | ||
Comment 5•16 years ago
|
||
(In reply to comment #4) > High-Surrogate Code Unit. A 16-bit code unit in the range D800 to DBFF, used in > UTF-16 as the leading code unit of a surrogate pair. Also known as a leading > surrogate. (See definition D72 in Section 3.8, Surrogates.) Yes, but a NCR must represent an abstract character in the document character set without being limited to a specific encoding, and surrogate code units are explicitly defined as not abstract characters, and are "used only in the context of the UTF-16 character encoding form". See also the section on supplementary characters in http://www.w3.org/International/questions/qa-escapes: |Supplementary characters are those Unicode characters that have code points |higher than the characters in the Basic Multilingual Plane (BMP). In UTF-16 a |supplementary character is encoded using two 16-bit surrogate code points from |the BMP. Because of this, some people think that supplementary characters need |to be represented using two escapes, but this is incorrect - you must use the |single, scalar value for that character. For example, use 𣎴 rather |than �� and the output of the W3C validator for attachment 352870 [details]: http://validator.w3.org/check?uri=https%3A//bugzilla.mozilla.org/attachment.cgi%3Fid%3D352870, which flags � and � as "reference to non-SGML character".
Reporter | ||
Comment 6•16 years ago
|
||
Aha! Ok, I see now why this isn't really a bug of Firefox. The real problem is the ASP function Server.HTMLEncode(). It splits the original character into two entities (� and �) instead of leaving it as one (𨳌). Thank you all for helping me. Sander
You need to log in
before you can comment on or make changes to this bug.
Description
•