Closed Bug 736011 Opened 13 years ago Closed 13 years ago

Unicode numeric character reference for characters encoded using 4bytes could not be displayed correctly

Categories

(Core :: DOM: HTML Parser, defect)

10 Branch
x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: gumplyz, Unassigned)

Details

Attachments

(2 files)

Attached image FFUnicode.gif
User Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7 Steps to reproduce: I have a html page as below with two input elements. First value is encoded using numeric character reference(NCR) and second value are two normal Chinese characters. The second characters is encoded using 4 bytes. <html> <head> <title> New Document </title> </head> <body> <input type="text" value="&#25105;&#55378;&#57186;"/> <input type="text" value="我
Attached file IssueHtml
Attach html source code to reproduce this issue.
(In reply to Yu from comment #0) > <input type="text" value="&#25105;&#55378;&#57186;"/> When you use numeric character references, HTML requires you to use a single character reference for the Unicode code point (&#x24B62; in this case) instead of encoding a surrogate pair as numeric character references.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
Henri, do we need to file a bug on WebKit here if they're violating the spec?
(In reply to Boris Zbarsky (:bz) from comment #4) > Henri, do we need to file a bug on WebKit here if they're violating the spec? AFAICT, WebKit and Opera aren't violating the spec. IE9 violates the spec, but it's been fixed in the latest mode of IE10. Hooray for specs and test suites.
Attachment #606146 - Attachment mime type: text/plain → text/html
Ah, indeed. I assumed that if the reporter was reporting the bug using Chrome then Chrome would have the "expected" behavior, but apparently not!
Thanks for your reply! I only noticed that it automatically add user agent in description. I tested on FF10 but filed this bug using Chome. I am wondering if there is any particular reason of using single character reference for one unicode code point? Thanks
Yes, the reason being that character references represent Unicode codepoints and are encoding-independent...
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: