Closed Bug 520095 Opened 12 years ago Closed 10 years ago

Should encoded U+FFFE and U+FFFF decode to U+FFFD?

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla8

People

(Reporter: Waldo, Assigned: Waldo)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [inbound])

Attachments

(2 files)

Bug 172699 made these characters decode to the replacement character.  Why?  Spec sez this:

Let V be the value obtained by applying the UTF-8 transformation to Octets, that is, from an array of octets into a 32-bit value. If Octects does not contain a valid UTF-8 encoding of a Unicode code point throw a URIError exception.

According to Table 21 in the ES5 final draft (pending approval in December), U+FFFE and U+FFFF are valid (only the surrogate range is not valid).

Do we preserve our still-non-standard behavior?  Should this be fixed in ES5 errata (or ES6)?  I have no idea, hence this bug.
(Yet another option would be to throw, which might be better than replacing silently if we actually want to treat U+FFF[EF] as error-ful, and it's arguably more consistent with how we'll handle error-ful overlong UTF-8 once I land bug 511859.)
U+FFFE has security implications similar to those for overlong sequences.  By embedding U+FFFE noncharacters, attackers can get an insufficiently careful UTF-16 decoder to byte swap subsequent input, sneaking prohibited characters past security-enforcing input validators.
Hm, why not U+FEFF then, if that were part of the reasoning?  Why why why couldn't the existing behavior have had an explanatory comment by it...
U+FEFF is a valid, assigned character.  There is no reason to prohibit it.

The code would be quite unmaintainable if every bug fix added explanatory comments.  Such things generally belong in Bugzilla, for example bug 74198 comment 3.
Er, oh, never mind, I don't know what I was thinking with respect to U+FEFF.

I disagree about where explanatory comments should go, but do note that the bug that introduced that code didn't have such a comment.
Bug 172699, which introduced this code, has an initial comment referring to bug 172699, which has the comment.
Blocks: 505991
This causes test262 failures:

S15.1.3.1_A2.4_T1	Complex tests, use RFC 3629	fail
S15.1.3.2_A2.4_T1	Complex tests, use RFC 3629	fail

Regardless what Unicode wants here, or what is purest from that point of view (and I do sympathize with it), I think ES5 wants us to treat Table 21 - UTF-8 Encodings as canonical.  Only bad or mismatched surrogates, or overlong sequences, are to be treated as errors.  That's what test262 wants.  It's what Chrome, Opera, IE, and Safari implement.  If anyone actually were to be broken by letting a couple more code points through, they'd also be broken in every other browser.  And if we did this, it looks like it'd address the dependency bug 505991.  We should make the change.
Assignee: general → jwalden+bmo
Blocks: test262
Status: NEW → ASSIGNED
This is all straightforward changes to whitespace, variable declaration location, and so on, so it doesn't need review.  I'm just posting it in case anyone wants to actually apply the next patch, which will modify a tree with this patch in it.
Attachment #546056 - Flags: review+
Attached patch Patch and testSplinter Review
This undoes replacement for these two characters and adds a test for non-replacing behavior.
Attachment #546057 - Flags: review?(masa141421356)
I think it is a poor choice to introduce a potential security vulnerability just because some standard or test suite seems to require it. I believe the security vulnerability should be brought up with the maintainers of the standard/suite in question.
I disagree, obviously, but the concern is at least plausible.  es-discuss can take a pass at it, although I doubt it will codify either current replace-with-U+FFFD behavior or throw-a-URIError behavior not implemented by any browser to date.

https://mail.mozilla.org/pipermail/es-discuss/2011-July/015993.html

Hopefully that request will receive more responses than it did the first time I made it:

https://mail.mozilla.org/pipermail/es-discuss/2009-October/010014.html

My query of a few hours ago is actually in response to this message, which received no replies when I originally posted it.
Comment on attachment 546057 [details] [diff] [review]
Patch and test

Looks fine.
Attachment #546057 - Flags: review?(masa141421356) → review+
In the absence of feedback from ECMA saying that censoring U+FFF{E,F} is appropriate, and in the presence of tentative feedback saying that it's the wrong thing to do, I landed the patch to make the switch:

http://hg.mozilla.org/integration/mozilla-inbound/rev/ffc7c2391d3f
Whiteboard: [inbound]
Target Milestone: --- → mozilla8
http://hg.mozilla.org/mozilla-central/rev/ffc7c2391d3f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.