Closed Bug 582788 Opened 9 years ago Closed 9 years ago
iso-10646 in meta not rejected as a non-ASCII-superset encoding
From: https://input.mozilla.com/en-US/search/?product=firefox&sentiment=sad Some user reports that the page http://cda.ipmailing.it/IPMailing/forms/optInForm1.asp cite: "won't show properly" The page is working with Firefox 3.6.8, so this is a regression. Im not sure for the component, but the page and the view source are both unreadable with Firefox 4 b2.
Firefox 3.6.8 detects the page encoding as Western (ISO-8859-1) Firefox 4b2 detects as (UTF-16BE) There is another problem with View source, i cant change the encoding until the view source window is reloaded, should i fill separate bug for this?
(In reply to comment #1) > There is another problem with View source, i cant change the encoding until the > view source window is reloaded anyway filled bug 582795
This is fallout from the HTML5 parser. I can get the broken behavior in 3.6.8 if I turn that parser on. Note that charsetalias.properties aliases iso-10646 to UTF-16BE. So I'm not quite sure why the old parser didn't use that encoding here... but if I change the meta to say "UTF-16BE" we render the page correctly. So there's some sort of special-casing going on here, and it seems to happen _before_ alias resolution. Should it happen after?
(In reply to comment #3) > So there's some sort > of special-casing going on here, and it seems to happen _before_ alias > resolution. Should it happen after? Looks like it need to happen after. The code is: http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5MetaScannerCppSupplement.h#87
OS: Windows XP → All
Priority: -- → P2
Hardware: x86 → All
Summary: Firefox 4 b2 cant render page with encoding iso-10646 → iso-10646 in meta not mapped to UTF-8
The bug is instead that similar checks are missing here: http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#750
Safari seems to special-case UTF-16 without alias resolution so it doesn't sniff to UTF-8. It then reject UTF-16BE as non-ASCII-based encoding, so the default encoding (Windows-1252) kicks in.
Summary: iso-10646 in meta not mapped to UTF-8 → iso-10646 in meta not rejected as a non-ASCII-superset encoding
I've examined four Web pages that declare iso-10646 in meta. (Thankfully, they are rare.) 3 were ASCII. One was Windows-1252. So from this data, it seems we should *not* do alias resolution before the UTF-16 to UTF-8 aliasing step.
(In reply to comment #7) > Safari seems to special-case UTF-16 without alias resolution so it doesn't > sniff to UTF-8. Chances are I'm misreading Safari's encoding menu. :-(
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #461189 - Flags: review?(bzbarsky)
Comment on attachment 461189 [details] [diff] [review] Unify late meta treatment with prescan treatment, remove UTF-32 to UTF-8 aliasing while at it This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral? r=me with that.
Attachment #461189 - Flags: review?(bzbarsky) → review+
(In reply to comment #11) > This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral? r=me > with that. Thanks. Pushed with that change: http://hg.mozilla.org/mozilla-central/rev/98617b5a532b The corresponding spec change is being tracked as http://www.w3.org/Bugs/Public/show_bug.cgi?id=10260
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.