Last Comment Bug 582788 - iso-10646 in meta not rejected as a non-ASCII-superset encoding
: iso-10646 in meta not rejected as a non-ASCII-superset encoding
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: All All
: P2 normal (vote)
: ---
Assigned To: Henri Sivonen (:hsivonen)
:
Mentors:
http://cda.ipmailing.it/IPMailing/for...
Depends on:
Blocks: 572886
  Show dependency treegraph
 
Reported: 2010-07-28 15:53 PDT by Emil Ivanov
Modified: 2010-07-30 03:34 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
Unify late meta treatment with prescan treatment, remove UTF-32 to UTF-8 aliasing while at it (4.77 KB, patch)
2010-07-29 05:57 PDT, Henri Sivonen (:hsivonen)
bzbarsky: review+
Details | Diff | Review

Description Emil Ivanov 2010-07-28 15:53:07 PDT
From: https://input.mozilla.com/en-US/search/?product=firefox&sentiment=sad

Some user reports that the page http://cda.ipmailing.it/IPMailing/forms/optInForm1.asp cite: "won't show properly"

The page is working with Firefox 3.6.8, so this is a regression.

Im not sure for the component, but the page and the view source are both unreadable with Firefox 4 b2.
Comment 1 Emil Ivanov 2010-07-28 16:05:31 PDT
Firefox 3.6.8 detects the page encoding as Western (ISO-8859-1)

Firefox 4b2 detects as (UTF-16BE)

There is another problem with View source, i cant change the encoding until the view source window is reloaded, should i fill separate bug for this?
Comment 2 Emil Ivanov 2010-07-28 16:29:48 PDT
(In reply to comment #1)
> There is another problem with View source, i cant change the encoding until the
> view source window is reloaded

anyway filled bug 582795
Comment 3 Boris Zbarsky [:bz] 2010-07-28 18:21:00 PDT
This is fallout from the HTML5 parser.  I can get the broken behavior in 3.6.8 if I turn that parser on.

Note that charsetalias.properties aliases iso-10646 to UTF-16BE.  So I'm not quite sure why the old parser didn't use that encoding here... but if I change the meta to say "UTF-16BE" we render the page correctly.  So there's some sort of special-casing going on here, and it seems to happen _before_ alias resolution.  Should it happen after?
Comment 4 Henri Sivonen (:hsivonen) 2010-07-29 00:05:40 PDT
(In reply to comment #3)
> So there's some sort
> of special-casing going on here, and it seems to happen _before_ alias
> resolution.  Should it happen after?

Looks like it need to happen after. The code is:
http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5MetaScannerCppSupplement.h#87
Comment 6 Henri Sivonen (:hsivonen) 2010-07-29 00:35:56 PDT
The bug is instead that similar checks are missing here:
http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#750
Comment 7 Henri Sivonen (:hsivonen) 2010-07-29 04:05:26 PDT
Safari seems to special-case UTF-16 without alias resolution so it doesn't sniff to UTF-8. It then reject UTF-16BE as non-ASCII-based encoding, so the default encoding (Windows-1252) kicks in.
Comment 8 Henri Sivonen (:hsivonen) 2010-07-29 04:35:31 PDT
I've examined four Web pages that declare iso-10646 in meta. (Thankfully, they are rare.) 3 were ASCII. One was Windows-1252. So from this data, it seems we should *not* do alias resolution before the UTF-16 to UTF-8 aliasing step.
Comment 9 Henri Sivonen (:hsivonen) 2010-07-29 05:04:35 PDT
(In reply to comment #7)
> Safari seems to special-case UTF-16 without alias resolution so it doesn't
> sniff to UTF-8.

Chances are I'm misreading Safari's encoding menu. :-(
Comment 10 Henri Sivonen (:hsivonen) 2010-07-29 05:57:23 PDT
Created attachment 461189 [details] [diff] [review]
Unify late meta treatment with prescan treatment, remove UTF-32 to UTF-8 aliasing while at it
Comment 11 Boris Zbarsky [:bz] 2010-07-29 08:56:11 PDT
Comment on attachment 461189 [details] [diff] [review]
Unify late meta treatment with prescan treatment, remove UTF-32 to UTF-8 aliasing while at it

This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral?  r=me with that.
Comment 12 Henri Sivonen (:hsivonen) 2010-07-30 03:34:33 PDT
(In reply to comment #11)
> This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral?  r=me
> with that.

Thanks. Pushed with that change:
http://hg.mozilla.org/mozilla-central/rev/98617b5a532b

The corresponding spec change is being tracked as
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10260

Note You need to log in before you can comment on or make changes to this bug.