Closed
Bug 396127
Opened 17 years ago
Closed 16 years ago
When setting input.value (for an HTMLInputElement) the next character is deleted after an invalid unpaired UTF-16 surrogate
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bugzilla, Assigned: smontagu)
References
()
Details
(Whiteboard: [sg:investigate])
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.3) Gecko/20060601 Firefox/2.0.0.3 (Ubuntu-edgy)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.3) Gecko/20060601 Firefox/2.0.0.3 (Ubuntu-edgy)
Normally, mozilla allows invalid unpaired UTF-16 surrogates into the DOM. But when setting the "value" attribute of an INPUT element, each invalid surrogate is removed *and so is the following character* (sometimes).
For instance, 'var s = "\ud863_"' creates a string of length 2. However, if we then set 'someInputElement.value = s', then "someInputElement.value" has length 0.
Deleting invalid surrogates is probably OK, but deleting a valid character is a really bad idea, because if it is some sort of control character (in any domain-specific sense) then the semantics of the string can be changed radically.
For instance, the "@" sign can be removed from an email address, a "/" removed from a URL, or a safe JSON string turned into malicious code (See the contents of the data URL given).
OK, it might be unlikely that a JSON string would be pasted into an input box, but if this behaviour extends to other objects then it could be quite unsafe.
Reproducible: Always
Steps to Reproduce:
var bad = String.fromCharCode(0xd863); // half a surrogate pair
var str = bad + "hello";
alert("before: " + str); // Prints "before: ?Hello"
var elt = document.createElement('input');
elt.value = str; // XXX mozilla removes the surrogate *and* the h
alert("after: " + elt.value); // prints "after: ello"
Reporter | ||
Updated•17 years ago
|
Summary: When setting input.value (for an HTMLInputElement) deletes next character after an invalid unpaired UTF-16 surrogate → When setting input.value (for an HTMLInputElement) the next character is deleted after an invalid unpaired UTF-16 surrogate
Comment 1•17 years ago
|
||
Interesting attack idea.
jst, where else does this behavior occur? For example, does the same thing happen during HTML parsing for either UTF-8 (when the UTF-8 encodes one of the code points reserved for use as surrogates) or UTF-16?
Comment 2•17 years ago
|
||
Trunk behavior is slightly different: we do not remove the surrogate, but we still remove the character after (or maybe we are treating the second character as part of the "pair"). the "after" alert from comment 0 on trunk is "after: ?ello", where '?' is now a graphical box.
Simon: what is the correct behavior here?
Assignee: nobody → smontagu
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [sg:investigate]
Comment 3•17 years ago
|
||
I vote for throwing an exception ;)
Assignee | ||
Comment 4•17 years ago
|
||
I am seeing "before: [fffd]ello" and "after: [fffd]ello" on linux trunk which is at least consistent. I think Dan is right and we are trying to decode 0xd863 and the next character as a surrogate pair. If we don't throw, correct behaviour would be to treat the unpaired surrogate as invalid and resynchronize on the next character, so the expected result is [fffd]hello, which is what we get from data:text/html,<p>�hello</p>
See also bug 316338
Assignee | ||
Comment 5•17 years ago
|
||
FWIW, fixing this would fix test 68 in Acid3.
Comment 6•17 years ago
|
||
There's a patch in bug 421576 which probably fixes this, but you know we don't want to be touching UTF-8 parsing code right now unless there's no way we can avoid it, so I expect it in 4 or perhaps 3.5.
Assignee | ||
Comment 7•16 years ago
|
||
Fixed by bug 421576
Updated•12 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•