Closed Bug 791003 Opened 12 years ago Closed 12 years ago

<meta charset="utf-7"> triggers NS_NOTREACHED in nsHtml5StreamParser

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: zwol, Assigned: zwol)

References

Details

Attachments

(1 file)

I am dusting off bug 663291.  In the process I noticed that one of the tests for bug 672453 was failing hard (this doesn't show up on the builders because those tests are disabled due to bug 739354), and after a little digging, that looks to be a genuine code bug.

nsHtml5StreamParser::PreferredForInternalEncodingDecl has an explicit blacklist of encodings that are not "rough supersets" of ASCII; the failing test is looking for the console error that happens when that blacklist triggers.  However, the blacklist applies _after_ mapping aliases to preferred names, and uconv _also_ has a blacklist of encodings (in charsetData.properties) that are known to permit XSS smuggling.  When we hit _that_ blacklist, nsCharsetAlias::GetPreferred returns an error code even though nsCharsetAlias::Equals didn't.  PreferredForInternalEncodingDecl assumes that that's a can't-happen situation.  The obvious fix is to treat that as another case where we should fire the "not a rough superset" console error.

It's possible that the explicit blacklist in PreferredForInternalEncodingDecl should go away at this point.  The entries in it that are not covered by the "incorrect UTF-16 declaration" case at the very top of the function, nor by the XSS-smuggling blacklist in charsetData.properties, are

    jis_x0212-1990
    x-jis0208
    x-user-defined

Patch to follow.
Blocks: 739354
Attached patch proposed fixSplinter Review
Attachment #660869 - Flags: review?(hsivonen)
Blocks: 663291
Comment on attachment 660869 [details] [diff] [review]
proposed fix

r+ given the described nsCharsetAlias behavior. (Ideally, nsCharsetAlias would behave as if it had no idea about the encoding for which decoders are unavailable. It's bad that we have encodings that sort of exist but sort of don't.)

Thanks.
Attachment #660869 - Flags: review?(hsivonen) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/e5498f64ae2c

I could file a follow-up bug for figuring out if we still need the hardcoded blacklist anymore.  I also think there should be a follow-up for the bad grammar in the error message ("declare a character encoding the does not encode" -- "the" should be "that").

I'd rather you filed a follow-up for the behavior of nsCharsetAlias if you think that's appropriate, because I don't understand nsCharsetAlias very well.
https://hg.mozilla.org/mozilla-central/rev/e5498f64ae2c
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: