Closed Bug 484406 Opened 15 years ago Closed 15 years ago

doctype sniffing should be more in line with IE

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
mozilla1.9.2a1

People

(Reporter: crazy-daniel, Unassigned)

Details

Attachments

(1 file)

Internet Explorer 8 has been released. I noticed there are a lot of doctypes that trigger Quirks Mode in Gecko, but Standards Mode in all versions of IE.

These doctypes include typos or were added to unbreak sites that had mysterious gaps before Almost Standards Mode was introduced. However, it looks like most of them were added for Netscape compat.

The list of doctypes we sniff was never revisited when Almost Standards Mode was introduced or when IE6 gained its Standards Mode.

It's time to catch up. IE8 improved its CSS support greatly and fixed some nasty DOM bugs. We should reflect this change in our code. We can test the change the entire alpha phase of 1.9.2.

This patch edits our list of doctypes as well as a mochitest that unintentionally tests for one of the affected doctypes.
Attachment #368532 - Flags: superreview?(jst)
Attachment #368532 - Flags: review?(jst)
Attachment #368532 - Flags: superreview?(mrbkap)
Attachment #368532 - Flags: superreview?(jst)
Attachment #368532 - Flags: review?(mrbkap)
Attachment #368532 - Flags: review?(jst)
Comment on attachment 368532 [details] [diff] [review]
more IE-compatible doctype sniffing

Blake would be a much better reviewer here...
Attachment #368532 - Flags: superreview?(mrbkap)
Attachment #368532 - Flags: superreview?(dbaron)
Attachment #368532 - Flags: review?(mrbkap)
Attachment #368532 - Flags: review?(dbaron)
Comment on attachment 368532 [details] [diff] [review]
more IE-compatible doctype sniffing

I don't know enough about how this list was created. Punting to dbaron who (I think) does.
I suggest rejecting this patch.

 * The mode switching in IE8 is very complex and crazy. I think we should avoid cloning it if we can get away with not cloning it. (This patch is nowhere near duplicating the IE8 behavior.)

 * I think vendors shouldn't make unilateral moves here when after years we've gotten Gecko, WebKit, Opera and HTML5 a step away from uniformness. If there are concrete Web compat problems with HTML5 doctype sniffing, I think those should be brought up in the HTML WG.

 * Intuitively, removing ancient doctypes from the quirky list has no upside since all content with those doctypes predate Gecko.

 * The HTML 4.0 Transitional behavior was explicitly left the way it is for stability even after the almost standards mode was introduced (even though I can't find the bug # right now). HTML5, WebKit and Opera carefully align with Gecko on this point.

 * The IBM system ID hack was needed to avoid breaking IBM sites. I can't see an upside to reversing that fix.

 * I think the only change we should implement is to move to HTML5 doctype sniffing precisely (in the HTML5 parser). HTML5 doctype sniffing is like Gecko's except the quirky list matches by prefix so that the "EN" bit at the end is ignored. According to Simon Pieters of Opera, this improves Web compat. (Previously WONTFIXed as bug 101600, though.)
(In reply to comment #3)
>  * The mode switching in IE8 is very complex and crazy. I think we should
> avoid cloning it if we can get away with not cloning it. (This patch is
> nowhere near duplicating the IE8 behavior.)

Bug and patch aren't about cloning IE's behaviour. It's simply about shortening the list of doctypes we sniff for.

>  * I think vendors shouldn't make unilateral moves here when after years we've
> gotten Gecko, WebKit, Opera and HTML5 a step away from uniformness. If there
> are concrete Web compat problems with HTML5 doctype sniffing, I think those
> should be brought up in the HTML WG.

Gecko, WebKit, Opera and HTML5 are all flexible products with short life cycles. IE is not.

When HTML 5 will become a recommendation, all browsers/engines above would've been updated long ago. But we'd still be working with these old IE versions.

It is a matter of interoperaility. In my opinion, the change of IE would be more problematic than a change of other browsers.

>  * Intuitively, removing ancient doctypes from the quirky list has no upside
> since all content with those doctypes predate Gecko.

I can't say I've seen all websites using these doctypes, but I can tell, that those I've seen also predate CSS usage. I tried those pages in IE8, switched between Standards and Quirks Mode and compared to other browsers.
There is no downside intuitively, but programatically there may be upsides.
After all, we're speaking of code that is executed at every page load.

>  * The HTML 4.0 Transitional behavior was explicitly left the way it is for
> stability even after the almost standards mode was introduced (even though I
> can't find the bug # right now). HTML5, WebKit and Opera carefully align with
> Gecko on this point.

IE does not. As I said, changing IE is much harder than changing the rest.
The IE teams decision may have been wrong, but it's too late to change now.

>  * The IBM system ID hack was needed to avoid breaking IBM sites. I can't see
> an upside to reversing that fix.

I can't see a downside now that IBM has removed most if not all uses of this doctype. The upside is IE-compat (and as I said before, forward-compatability imho).

>  * I think the only change we should implement is to move to HTML5 doctype
> sniffing precisely (in the HTML5 parser). HTML5 doctype sniffing is like
> Gecko's except the quirky list matches by prefix so that the "EN" bit at the
> end is ignored. According to Simon Pieters of Opera, this improves Web compat.
> (Previously WONTFIXed as bug 101600, though.)

I've got no objections to this change. This decision and the patch here are not mutually exclusive.
(In reply to comment #4)
> (In reply to comment #3)
> >  * The mode switching in IE8 is very complex and crazy. I think we should
> > avoid cloning it if we can get away with not cloning it. (This patch is
> > nowhere near duplicating the IE8 behavior.)
> 
> Bug and patch aren't about cloning IE's behaviour. It's simply about shortening
> the list of doctypes we sniff for.

Is there a concrete need to make it shorter?

> >  * I think vendors shouldn't make unilateral moves here when after years we've
> > gotten Gecko, WebKit, Opera and HTML5 a step away from uniformness. If there
> > are concrete Web compat problems with HTML5 doctype sniffing, I think those
> > should be brought up in the HTML WG.
> 
> Gecko, WebKit, Opera and HTML5 are all flexible products with short life
> cycles. IE is not.
> 
> When HTML 5 will become a recommendation, all browsers/engines above would've
> been updated long ago. But we'd still be working with these old IE versions.
> 
> It is a matter of interoperaility. In my opinion, the change of IE would be
> more problematic than a change of other browsers.

Are there concrete cases where Gecko, Opera or WebKit looks bad compared to IE8 because a given page is rendered in the quirks mode in Gecko, Opera or WebKit but in the IE8 almost standards mode or the IE8 standards mode in IE8?

> >  * Intuitively, removing ancient doctypes from the quirky list has no upside
> > since all content with those doctypes predate Gecko.
> 
> I can't say I've seen all websites using these doctypes, but I can tell, that
> those I've seen also predate CSS usage. I tried those pages in IE8, switched
> between Standards and Quirks Mode and compared to other browsers.

The change would make some pre-existing pages render in the standards mode or almost standards mode in Gecko as opposed to the quirks mode. On the face of it, migrating the rendering of legacy pages towards the standards mode doesn't have an upside. The effects are either indifferent or negative. (If the affected pages were maintained and broke in the quirks mode, they'd already have been migrated to the standards mode during the past 8 or so years.)

We should promote the use of the standards mode for new content, but that has nothing to do with the doctypes affected by this patch. New content should be created with newer doctypes anyway.

> There is no downside intuitively, but programatically there may be upsides.
> After all, we're speaking of code that is executed at every page load.

Do you mean the performance impact of searching a longer list? The search currently uses binary search and the patch doesn't halve the length of the list. If performance indeed is a measurable issue that needs addressing, I think we should try to address this in the HTML5 parser by creating a table-driven automaton (ahead of compile time) that recognizes the right strings.
 
> >  * The HTML 4.0 Transitional behavior was explicitly left the way it is for
> > stability even after the almost standards mode was introduced (even though I
> > can't find the bug # right now). HTML5, WebKit and Opera carefully align with
> > Gecko on this point.
> 
> IE does not. As I said, changing IE is much harder than changing the rest.
> The IE teams decision may have been wrong, but it's too late to change now.

If we really want to change this point, I think the issue should be raised on public-html and coordinated with other vendors. However, I think we shouldn't change the behavior on this point unless there are concrete top sites that activate the IE8 almost standards mode using the HTML 4.0 Transitional doctype but Gecko/Opera/WebKit quirks mode, are better in IE8 and don't respond to evangelism.

> >  * The IBM system ID hack was needed to avoid breaking IBM sites. I can't see
> > an upside to reversing that fix.
> 
> I can't see a downside now that IBM has removed most if not all uses of this
> doctype.

I don't have the resources to verify that IBM has removed the uses of their old doctype. What method did you use to arrive at the conclusion that IBM has removed most uses of the doctype?
That patch pretty explicitly regresses a bunch of the IBM pages, no?  Was it even tested against the testcases and urls in the bugs that led to those doctypes being added to the list?
(In reply to comment #5)
> If we really want to change this point, I think the issue should be raised on
> public-html and coordinated with other vendors.

I have started a discussion on public-html: http://lists.w3.org/Archives/Public/public-html/2009Mar/0672.html

(In reply to comment #6)
> That patch pretty explicitly regresses a bunch of the IBM pages, no?  Was it
> even tested against the testcases and urls in the bugs that led to those
> doctypes being added to the list?

I found almost no records for most of these doctypes. They could've been there since the beginning of mode switching. If not, bug 55264 is the place to look for.

There's the IBM bug 224727, but it doesn't reveal much, as you may remember it was added in bug 153032.

The other doctypes we added on a per page decision aren't included in this patch.
I heard HTML 4.0 doctypes were added because Apple had mysterious gaps (according to Henri). The HTML 4 doctype with the typo of having only one slash separating /dtd/ and /en was used on a website by HP, both long redesigned.

Robert Accettura mentioned the IBM doctype on his blog: http://robert.accettura.com/blog/2007/01/20/secrets-in-websites/ (scroll to IBM), and Michael Kaply, who has worked over 15 years for IBM, also mentioned that this doctype is mostly history: http://robert.accettura.com/blog/2007/01/20/secrets-in-websites/#comment-252764
cc'ing dbaron for comment
I can't for the life of me understand why it would be a good idea to change this code twice (once for this, once for HTML5 parsing) in a fairly short period of time.  If this is a good change to make, push it into HTML5 and pick it up then, but don't make changes here twice.
Comment on attachment 368532 [details] [diff] [review]
more IE-compatible doctype sniffing

Doctype sniffing originated as a heuristic for determining whether a page predated "standard-compliant" browsers or was created while the author was or should have been aware of them.

Removing a bunch of really old doctypes from the list because they're becoming a less important part of the Web doesn't make sense, both because it doesn't fit with that heuristic, and because we do want to remain compatible with old content; we don't want to make Web pages written in 1995 like files in some old defunct word processor that no longer exists.

So for those and a bunch of the reasons stated by others, I agree with the comments suggesting that we should not make this change.  Marking as review-denied.
Attachment #368532 - Flags: superreview?(dbaron)
Attachment #368532 - Flags: superreview-
Attachment #368532 - Flags: review?(dbaron)
Attachment #368532 - Flags: review-
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: