Should encoded U+FFFE and U+FFFF decode to U+FFFD?

VERIFIED FIXED in mozilla8

Status

()

Core
JavaScript Engine
VERIFIED FIXED
8 years ago
6 years ago

People

(Reporter: Waldo, Assigned: Waldo)

Tracking

(Blocks: 1 bug)

Trunk
mozilla8
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [inbound], URL)

Attachments

(2 attachments)

Bug 172699 made these characters decode to the replacement character.  Why?  Spec sez this:

Let V be the value obtained by applying the UTF-8 transformation to Octets, that is, from an array of octets into a 32-bit value. If Octects does not contain a valid UTF-8 encoding of a Unicode code point throw a URIError exception.

According to Table 21 in the ES5 final draft (pending approval in December), U+FFFE and U+FFFF are valid (only the surrogate range is not valid).

Do we preserve our still-non-standard behavior?  Should this be fixed in ES5 errata (or ES6)?  I have no idea, hence this bug.
(Yet another option would be to throw, which might be better than replacing silently if we actually want to treat U+FFF[EF] as error-ful, and it's arguably more consistent with how we'll handle error-ful overlong UTF-8 once I land bug 511859.)

Comment 2

8 years ago
U+FFFE has security implications similar to those for overlong sequences.  By embedding U+FFFE noncharacters, attackers can get an insufficiently careful UTF-16 decoder to byte swap subsequent input, sneaking prohibited characters past security-enforcing input validators.
Hm, why not U+FEFF then, if that were part of the reasoning?  Why why why couldn't the existing behavior have had an explanatory comment by it...

Comment 4

8 years ago
U+FEFF is a valid, assigned character.  There is no reason to prohibit it.

The code would be quite unmaintainable if every bug fix added explanatory comments.  Such things generally belong in Bugzilla, for example bug 74198 comment 3.
Er, oh, never mind, I don't know what I was thinking with respect to U+FEFF.

I disagree about where explanatory comments should go, but do note that the bug that introduced that code didn't have such a comment.

Comment 6

8 years ago
Bug 172699, which introduced this code, has an initial comment referring to bug 172699, which has the comment.

Updated

8 years ago
Blocks: 505991
This causes test262 failures:

S15.1.3.1_A2.4_T1	Complex tests, use RFC 3629	fail
S15.1.3.2_A2.4_T1	Complex tests, use RFC 3629	fail

Regardless what Unicode wants here, or what is purest from that point of view (and I do sympathize with it), I think ES5 wants us to treat Table 21 - UTF-8 Encodings as canonical.  Only bad or mismatched surrogates, or overlong sequences, are to be treated as errors.  That's what test262 wants.  It's what Chrome, Opera, IE, and Safari implement.  If anyone actually were to be broken by letting a couple more code points through, they'd also be broken in every other browser.  And if we did this, it looks like it'd address the dependency bug 505991.  We should make the change.
Assignee: general → jwalden+bmo
Blocks: 652780
Status: NEW → ASSIGNED
Created attachment 546056 [details] [diff] [review]
Adjust formatting, whitespace in Utf8ToOneUcs4Char

This is all straightforward changes to whitespace, variable declaration location, and so on, so it doesn't need review.  I'm just posting it in case anyone wants to actually apply the next patch, which will modify a tree with this patch in it.
Attachment #546056 - Flags: review+
Created attachment 546057 [details] [diff] [review]
Patch and test

This undoes replacement for these two characters and adds a test for non-replacing behavior.
Attachment #546057 - Flags: review?(masa141421356)

Comment 10

6 years ago
I think it is a poor choice to introduce a potential security vulnerability just because some standard or test suite seems to require it. I believe the security vulnerability should be brought up with the maintainers of the standard/suite in question.
I disagree, obviously, but the concern is at least plausible.  es-discuss can take a pass at it, although I doubt it will codify either current replace-with-U+FFFD behavior or throw-a-URIError behavior not implemented by any browser to date.

https://mail.mozilla.org/pipermail/es-discuss/2011-July/015993.html

Hopefully that request will receive more responses than it did the first time I made it:

https://mail.mozilla.org/pipermail/es-discuss/2009-October/010014.html

My query of a few hours ago is actually in response to this message, which received no replies when I originally posted it.
The response so far:

https://mail.mozilla.org/pipermail/es-discuss/2011-July/016000.html

Comment 13

6 years ago
Comment on attachment 546057 [details] [diff] [review]
Patch and test

Looks fine.
Attachment #546057 - Flags: review?(masa141421356) → review+
In the absence of feedback from ECMA saying that censoring U+FFF{E,F} is appropriate, and in the presence of tentative feedback saying that it's the wrong thing to do, I landed the patch to make the switch:

http://hg.mozilla.org/integration/mozilla-inbound/rev/ffc7c2391d3f
Whiteboard: [inbound]
Target Milestone: --- → mozilla8
http://hg.mozilla.org/mozilla-central/rev/ffc7c2391d3f
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Updated

6 years ago
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.