Closed Bug 520095 Opened 15 years ago Closed 13 years ago

Should encoded U+FFFE and U+FFFF decode to U+FFFD?

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla8

People

(Reporter: Waldo, Assigned: Waldo)

References

(Blocks 1 open bug,
URL
)

Details

(Whiteboard: [inbound])

Attachments

(2 files)

Adjust formatting, whitespace in Utf8ToOneUcs4Char 13 years ago Jeff Walden [:Waldo] 2.23 KB, patch	Waldo : review+	Details \| Diff \| Splinter Review
Patch and test 13 years ago Jeff Walden [:Waldo] 2.16 KB, patch	masa141421356 : review+	Details \| Diff \| Splinter Review

Jeff Walden [:Waldo]

Assignee

Description

•

15 years ago

Bug 172699 made these characters decode to the replacement character.  Why?  Spec sez this:

Let V be the value obtained by applying the UTF-8 transformation to Octets, that is, from an array of octets into a 32-bit value. If Octects does not contain a valid UTF-8 encoding of a Unicode code point throw a URIError exception.

According to Table 21 in the ES5 final draft (pending approval in December), U+FFFE and U+FFFF are valid (only the surrogate range is not valid).

Do we preserve our still-non-standard behavior?  Should this be fixed in ES5 errata (or ES6)?  I have no idea, hence this bug.

Jeff Walden [:Waldo]

Assignee

Comment 1

•

15 years ago

(Yet another option would be to throw, which might be better than replacing silently if we actually want to treat U+FFF[EF] as error-ful, and it's arguably more consistent with how we'll handle error-ful overlong UTF-8 once I land bug 511859.)

John G. Myers

Comment 2

•

15 years ago

U+FFFE has security implications similar to those for overlong sequences.  By embedding U+FFFE noncharacters, attackers can get an insufficiently careful UTF-16 decoder to byte swap subsequent input, sneaking prohibited characters past security-enforcing input validators.

Jeff Walden [:Waldo]

Assignee

Comment 3

•

15 years ago

Hm, why not U+FEFF then, if that were part of the reasoning?  Why why why couldn't the existing behavior have had an explanatory comment by it...

John G. Myers

Comment 4

•

15 years ago

U+FEFF is a valid, assigned character.  There is no reason to prohibit it.

The code would be quite unmaintainable if every bug fix added explanatory comments.  Such things generally belong in Bugzilla, for example bug 74198 comment 3.

Jeff Walden [:Waldo]

Assignee

Comment 5

•

15 years ago

Er, oh, never mind, I don't know what I was thinking with respect to U+FEFF.

I disagree about where explanatory comments should go, but do note that the bug that introduced that code didn't have such a comment.

John G. Myers

Comment 6

•

15 years ago

Bug 172699, which introduced this code, has an initial comment referring to bug 172699, which has the comment.

Dão Gottwald [:dao]

Updated

•

15 years ago

Blocks: 505991

Jeff Walden [:Waldo]

Assignee

Comment 7

•

13 years ago

This causes test262 failures:

S15.1.3.1_A2.4_T1	Complex tests, use RFC 3629	fail
S15.1.3.2_A2.4_T1	Complex tests, use RFC 3629	fail

Regardless what Unicode wants here, or what is purest from that point of view (and I do sympathize with it), I think ES5 wants us to treat Table 21 - UTF-8 Encodings as canonical.  Only bad or mismatched surrogates, or overlong sequences, are to be treated as errors.  That's what test262 wants.  It's what Chrome, Opera, IE, and Safari implement.  If anyone actually were to be broken by letting a couple more code points through, they'd also be broken in every other browser.  And if we did this, it looks like it'd address the dependency bug 505991.  We should make the change.

Assignee: general → jwalden+bmo

Blocks: test262

Status: NEW → ASSIGNED

Jeff Walden [:Waldo]

Assignee

Comment 8

•

13 years ago

Attached patch Adjust formatting, whitespace in Utf8ToOneUcs4Char — Details — Splinter Review

This is all straightforward changes to whitespace, variable declaration location, and so on, so it doesn't need review.  I'm just posting it in case anyone wants to actually apply the next patch, which will modify a tree with this patch in it.

Attachment #546056 - Flags: review+

Jeff Walden [:Waldo]

Assignee

Comment 9

•

13 years ago

Attached patch Patch and test — Details — Splinter Review

This undoes replacement for these two characters and adds a test for non-replacing behavior.

Attachment #546057 - Flags: review?(masa141421356)

John G. Myers

Comment 10

•

13 years ago

I think it is a poor choice to introduce a potential security vulnerability just because some standard or test suite seems to require it. I believe the security vulnerability should be brought up with the maintainers of the standard/suite in question.

Jeff Walden [:Waldo]

Assignee

Comment 11

•

13 years ago

I disagree, obviously, but the concern is at least plausible.  es-discuss can take a pass at it, although I doubt it will codify either current replace-with-U+FFFD behavior or throw-a-URIError behavior not implemented by any browser to date.

https://mail.mozilla.org/pipermail/es-discuss/2011-July/015993.html

Hopefully that request will receive more responses than it did the first time I made it:

https://mail.mozilla.org/pipermail/es-discuss/2009-October/010014.html

My query of a few hours ago is actually in response to this message, which received no replies when I originally posted it.

Jeff Walden [:Waldo]

Assignee

Comment 12

•

13 years ago

The response so far:

https://mail.mozilla.org/pipermail/es-discuss/2011-July/016000.html

Masahiro YAMADA

Comment 13

•

13 years ago

Comment on attachment 546057 [details] [diff] [review]
Patch and test

Looks fine.

Attachment #546057 - Flags: review?(masa141421356) → review+

Jeff Walden [:Waldo]

Assignee

Comment 14

•

13 years ago

In the absence of feedback from ECMA saying that censoring U+FFF{E,F} is appropriate, and in the presence of tentative feedback saying that it's the wrong thing to do, I landed the patch to make the switch:

http://hg.mozilla.org/integration/mozilla-inbound/rev/ffc7c2391d3f

Whiteboard: [inbound]

Target Milestone: --- → mozilla8

Marco Bonardo [:mak]

Comment 15

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/ffc7c2391d3f

Status: ASSIGNED → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Masahiro YAMADA

Updated

•

13 years ago

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.