iso-10646 in meta not rejected as a non-ASCII-superset encoding

RESOLVED FIXED

Status

()

Core
HTML: Parser
P2
normal
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: Emil Ivanov, Assigned: hsivonen)

Tracking

({regression})

Trunk
regression
Points:
---

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

7 years ago
From: https://input.mozilla.com/en-US/search/?product=firefox&sentiment=sad

Some user reports that the page http://cda.ipmailing.it/IPMailing/forms/optInForm1.asp cite: "won't show properly"

The page is working with Firefox 3.6.8, so this is a regression.

Im not sure for the component, but the page and the view source are both unreadable with Firefox 4 b2.
(Reporter)

Comment 1

7 years ago
Firefox 3.6.8 detects the page encoding as Western (ISO-8859-1)

Firefox 4b2 detects as (UTF-16BE)

There is another problem with View source, i cant change the encoding until the view source window is reloaded, should i fill separate bug for this?
(Reporter)

Comment 2

7 years ago
(In reply to comment #1)
> There is another problem with View source, i cant change the encoding until the
> view source window is reloaded

anyway filled bug 582795
This is fallout from the HTML5 parser.  I can get the broken behavior in 3.6.8 if I turn that parser on.

Note that charsetalias.properties aliases iso-10646 to UTF-16BE.  So I'm not quite sure why the old parser didn't use that encoding here... but if I change the meta to say "UTF-16BE" we render the page correctly.  So there's some sort of special-casing going on here, and it seems to happen _before_ alias resolution.  Should it happen after?
(Assignee)

Comment 4

7 years ago
(In reply to comment #3)
> So there's some sort
> of special-casing going on here, and it seems to happen _before_ alias
> resolution.  Should it happen after?

Looks like it need to happen after. The code is:
http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5MetaScannerCppSupplement.h#87
OS: Windows XP → All
Priority: -- → P2
Hardware: x86 → All
Summary: Firefox 4 b2 cant render page with encoding iso-10646 → iso-10646 in meta not mapped to UTF-8
(Assignee)

Comment 5

7 years ago
I wonder why http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5MetaScannerCppSupplement.h#112 isn't working
(Assignee)

Comment 6

7 years ago
The bug is instead that similar checks are missing here:
http://mxr-test.konigsberg.mozilla.org/mozilla-central/source/parser/html/nsHtml5StreamParser.cpp#750
(Assignee)

Updated

7 years ago
Blocks: 572886
(Assignee)

Comment 7

7 years ago
Safari seems to special-case UTF-16 without alias resolution so it doesn't sniff to UTF-8. It then reject UTF-16BE as non-ASCII-based encoding, so the default encoding (Windows-1252) kicks in.
(Assignee)

Updated

7 years ago
Summary: iso-10646 in meta not mapped to UTF-8 → iso-10646 in meta not rejected as a non-ASCII-superset encoding
(Assignee)

Comment 8

7 years ago
I've examined four Web pages that declare iso-10646 in meta. (Thankfully, they are rare.) 3 were ASCII. One was Windows-1252. So from this data, it seems we should *not* do alias resolution before the UTF-16 to UTF-8 aliasing step.
(Assignee)

Comment 9

7 years ago
(In reply to comment #7)
> Safari seems to special-case UTF-16 without alias resolution so it doesn't
> sniff to UTF-8.

Chances are I'm misreading Safari's encoding menu. :-(
(Assignee)

Comment 10

7 years ago
Created attachment 461189 [details] [diff] [review]
Unify late meta treatment with prescan treatment, remove UTF-32 to UTF-8 aliasing while at it
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED
Attachment #461189 - Flags: review?(bzbarsky)
(Assignee)

Updated

7 years ago
Comment on attachment 461189 [details] [diff] [review]
Unify late meta treatment with prescan treatment, remove UTF-32 to UTF-8 aliasing while at it

This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral?  r=me with that.
Attachment #461189 - Flags: review?(bzbarsky) → review+

Updated

7 years ago
blocking2.0: ? → betaN+
(Assignee)

Comment 12

7 years ago
(In reply to comment #11)
> This looks fine, but shouldn't all those calls be LowerCaseEqualsLiteral?  r=me
> with that.

Thanks. Pushed with that change:
http://hg.mozilla.org/mozilla-central/rev/98617b5a532b

The corresponding spec change is being tracked as
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10260
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.