Closed
Bug 736011
Opened 13 years ago
Closed 13 years ago
Unicode numeric character reference for characters encoded using 4bytes could not be displayed correctly
Categories
(Core :: DOM: HTML Parser, defect)
Tracking
()
RESOLVED
INVALID
People
(Reporter: gumplyz, Unassigned)
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7
Steps to reproduce:
I have a html page as below with two input elements. First value is encoded using numeric character reference(NCR) and second value are two normal Chinese characters. The second characters is encoded using 4 bytes.
<html>
<head>
<title> New Document </title>
</head>
<body>
<input type="text" value="我��"/>
<input type="text" value="我
Comment 3•13 years ago
|
||
(In reply to Yu from comment #0)
> <input type="text" value="我��"/>
When you use numeric character references, HTML requires you to use a single character reference for the Unicode code point (𤭢 in this case) instead of encoding a surrogate pair as numeric character references.
Status: UNCONFIRMED → RESOLVED
Closed: 13 years ago
Resolution: --- → INVALID
![]() |
||
Comment 4•13 years ago
|
||
Henri, do we need to file a bug on WebKit here if they're violating the spec?
Comment 5•13 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #4)
> Henri, do we need to file a bug on WebKit here if they're violating the spec?
AFAICT, WebKit and Opera aren't violating the spec. IE9 violates the spec, but it's been fixed in the latest mode of IE10. Hooray for specs and test suites.
![]() |
||
Updated•13 years ago
|
Attachment #606146 -
Attachment mime type: text/plain → text/html
![]() |
||
Comment 6•13 years ago
|
||
Ah, indeed. I assumed that if the reporter was reporting the bug using Chrome then Chrome would have the "expected" behavior, but apparently not!
Thanks for your reply!
I only noticed that it automatically add user agent in description. I tested on FF10 but filed this bug using Chome.
I am wondering if there is any particular reason of using single character reference for one unicode code point?
Thanks
![]() |
||
Comment 8•13 years ago
|
||
Yes, the reason being that character references represent Unicode codepoints and are encoding-independent...
You need to log in
before you can comment on or make changes to this bug.
Description
•