Closed Bug 396127 Opened 17 years ago Closed 16 years ago

When setting input.value (for an HTMLInputElement) the next character is deleted after an invalid unpaired UTF-16 surrogate

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: bugzilla, Assigned: smontagu)

References

(
URL
)

Details

(Whiteboard: [sg:investigate])

David Chan

Reporter

Description

•

17 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.3) Gecko/20060601 Firefox/2.0.0.3 (Ubuntu-edgy) Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-GB; rv:1.8.1.3) Gecko/20060601 Firefox/2.0.0.3 (Ubuntu-edgy) Normally, mozilla allows invalid unpaired UTF-16 surrogates into the DOM. But when setting the "value" attribute of an INPUT element, each invalid surrogate is removed *and so is the following character* (sometimes). For instance, 'var s = "\ud863_"' creates a string of length 2. However, if we then set 'someInputElement.value = s', then "someInputElement.value" has length 0. Deleting invalid surrogates is probably OK, but deleting a valid character is a really bad idea, because if it is some sort of control character (in any domain-specific sense) then the semantics of the string can be changed radically. For instance, the "@" sign can be removed from an email address, a "/" removed from a URL, or a safe JSON string turned into malicious code (See the contents of the data URL given). OK, it might be unlikely that a JSON string would be pasted into an input box, but if this behaviour extends to other objects then it could be quite unsafe. Reproducible: Always Steps to Reproduce: var bad = String.fromCharCode(0xd863); // half a surrogate pair var str = bad + "hello"; alert("before: " + str); // Prints "before: ?Hello" var elt = document.createElement('input'); elt.value = str; // XXX mozilla removes the surrogate *and* the h alert("after: " + elt.value); // prints "after: ello"

David Chan

Reporter

Updated

•

17 years ago

Summary: When setting input.value (for an HTMLInputElement) deletes next character after an invalid unpaired UTF-16 surrogate → When setting input.value (for an HTMLInputElement) the next character is deleted after an invalid unpaired UTF-16 surrogate

Jesse Ruderman

Comment 1

•

17 years ago

Interesting attack idea. jst, where else does this behavior occur? For example, does the same thing happen during HTML parsing for either UTF-8 (when the UTF-8 encodes one of the code points reserved for use as surrogates) or UTF-16?

Daniel Veditz [:dveditz]

Comment 2

•

17 years ago

Trunk behavior is slightly different: we do not remove the surrogate, but we still remove the character after (or maybe we are treating the second character as part of the "pair"). the "after" alert from comment 0 on trunk is "after: ?ello", where '?' is now a graphical box. Simon: what is the correct behavior here?

Assignee: nobody → smontagu

Status: UNCONFIRMED → NEW

Ever confirmed: true

Whiteboard: [sg:investigate]

Jesse Ruderman

Comment 3

•

17 years ago

I vote for throwing an exception ;)

Simon Montagu :smontagu

Assignee

Comment 4

•

17 years ago

I am seeing "before: [fffd]ello" and "after: [fffd]ello" on linux trunk which is at least consistent. I think Dan is right and we are trying to decode 0xd863 and the next character as a surrogate pair. If we don't throw, correct behaviour would be to treat the unpaired surrogate as invalid and resynchronize on the next character, so the expected result is [fffd]hello, which is what we get from data:text/html,<p>&#xd863;hello</p> See also bug 316338

Simon Montagu :smontagu

Assignee

Comment 5

•

17 years ago

FWIW, fixing this would fix test 68 in Acid3.

Jeff Walden [:Waldo]

Comment 6

•

17 years ago

There's a patch in bug 421576 which probably fixes this, but you know we don't want to be touching UTF-8 parsing code right now unless there's no way we can avoid it, so I expect it in 4 or perhaps 3.5.

timeless

Updated

•

16 years ago

Component: DOM: HTML → DOM: Core & HTML

Simon Montagu :smontagu

Assignee

Comment 7

•

16 years ago

Fixed by bug 421576

Status: NEW → RESOLVED

Closed: 16 years ago

Depends on: 421576

Resolution: --- → FIXED

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Group: core-security

You need to log in before you can comment on or make changes to this bug.

Bugzilla

When setting input.value (for an HTMLInputElement) the next character is deleted after an invalid unpaired UTF-16 surrogate

Categories

(Core :: DOM: Core & HTML, defect)

Tracking

()

People

(Reporter: bugzilla, Assigned: smontagu)

References

(
URL
)

Details

(Whiteboard: [sg:investigate])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Updated