Closed Bug 582788 Opened 9 years ago Closed 9 years ago

iso-10646 in meta not rejected as a non-ASCII-superset encoding

Categories

(Core :: HTML: Parser, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: stream, Assigned: hsivonen)

References

()

Details

(Keywords: regression)

Attachments

(1 file)

From: https://input.mozilla.com/en-US/search/?product=firefox&sentiment=sad

Some user reports that the page http://cda.ipmailing.it/IPMailing/forms/optInForm1.asp cite: "won't show properly"

The page is working with Firefox 3.6.8, so this is a regression.

Im not sure for the component, but the page and the view source are both unreadable with Firefox 4 b2.
Firefox 3.6.8 detects the page encoding as Western (ISO-8859-1)

Firefox 4b2 detects as (UTF-16BE)

There is another problem with View source, i cant change the encoding until the view source window is reloaded, should i fill separate bug for this?
(In reply to comment #1)
> There is another problem with View source, i cant change the encoding until the
> view source window is reloaded

anyway filled bug 582795
This is fallout from the HTML5 parser.  I can get the broken behavior in 3.6.8 if I turn that parser on.

Note that charsetalias.properties aliases iso-10646 to UTF-16BE.  So I'm not quite sure why the old parser didn't use that encoding here... but if I change the meta to say "UTF-16BE" we render the page correctly.  So there's some sort of special-casing going on here, and it seems to happen _before_ alias resolution.  Should it happen after?
(In reply to comment #3)
> So there's some sort
> of special-casing going on here, and it seems to happen _before_ alias
> resolution.  Should it happen after?

Looks like it need to happen after. The code is:
http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5MetaScannerCppSupplement.h#87
OS: Windows XP → All
Priority: -- → P2
Hardware: x86 → All
Summary: Firefox 4 b2 cant render page with encoding iso-10646 → iso-10646 in meta not mapped to UTF-8
Blocks: 572886
Safari seems to special-case UTF-16 without alias resolution so it doesn't sniff to UTF-8. It then reject UTF-16BE as non-ASCII-based encoding, so the default encoding (Windows-1252) kicks in.
Summary: iso-10646 in meta not mapped to UTF-8 → iso-10646 in meta not rejected as a non-ASCII-superset encoding
I've examined four Web pages that declare iso-10646 in meta. (Thankfully, they are rare.) 3 were ASCII. One was Windows-1252. So from this data, it seems we should *not* do alias resolution before the UTF-16 to UTF-8 aliasing step.
(In reply to comment #7)
> Safari seems to special-case UTF-16 without alias resolution so it doesn't
> sniff to UTF-8.

Chances are I'm misreading Safari's encoding menu. :-(
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #461189 - Flags: review?(bzbarsky)
Comment on attachment 461189 [details] [diff] [review]
Unify late meta treatment with prescan treatment, remove UTF-32 to UTF-8 aliasing while at it

This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral?  r=me with that.
Attachment #461189 - Flags: review?(bzbarsky) → review+
blocking2.0: ? → betaN+
(In reply to comment #11)
> This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral?  r=me
> with that.

Thanks. Pushed with that change:
http://hg.mozilla.org/mozilla-central/rev/98617b5a532b

The corresponding spec change is being tracked as
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10260
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.