496 bytes, text/html
2.61 KB, patch
|Details | Diff | Splinter Review|
763 bytes, patch
|Details | Diff | Splinter Review|
A DOCTYPE of HTML 4/5/etc. should make compatiblity mode work. Currently the only way to enable compatibility mode is in debug builds through a menu item. Also, for cases where changing the DOCTYPE isn't appropriate, we should support a "META" tag for accomplishing the same thing.
*** Bug 1562 has been marked as a duplicate of this bug. ***
Setting all current Open/Normal to M4.
per leger, assigning QA contacts to all open bugs without QA contacts according to list at http://bugzilla.mozilla.org/describecomponents.cgi?product=Browser
reassigning qacontact to gem (HTML Parser)
*** Bug 2072 has been marked as a duplicate of this bug. ***
Bug 2072 pointed to these test pages: http://www.fas.harvard.edu/~dbaron/tests/nglayout/compat/
Note. This is currently making QA of standards compliance issues difficult (eg, verification of bug 2749 is awaiting doctype-controlled compat mode). Also, a DOCTYPE of HTML 4/5/etc. should make _standard_ mode work. It is an HTML _3_ DOCTYPE that should enable compatibility mode.
This is currently marked M10, but knowing how bugs slip milestones, I'll say that I don't think a beta should go out without this, since if it's introduced later, you might get more people complaining you've broken their HTML 4 pages by not having quirks mode, while it worked in previous versions. This should really be P1 I think, since the longer you delay the harder it will be to introduce.
This should include the new HTML 4.01 DOCTYPE (assuming the spec gets past the proposed recommendation stage): <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html40/strict.dtd"> See http://www.w3.org/MarkUp/#news for details.
The following FPIs should definitely trigger strict mode: In http://woodworm.cs.uml.edu/~rprice/15445/15445.html "ISO/IEC 15445:1999//DTD HyperText Markup Language//EN" "ISO/IEC 15445:1999//DTD HTML//EN" In http://www.w3.org/TR/xhtml1/ "-//W3C//DTD XHTML 1.0 Strict//EN" "-//W3C//DTD XHTML 1.0 Transitional//EN" "-//W3C//DTD XHTML 1.0 Frameset//EN" In http://www.w3.org/TR/html40 "-//W3C//DTD HTML 4.01//EN" And probably in http://www.w3.org/TR/REC-html40/ "-//W3C//DTD HTML 4.0//EN" How will this be future compatible? Should you, instead, compile a list of known other doctypes and recognize any doctype not on that list (or none at all) as quirks, so that all new doctypes will be standard mode? If you want to do this, I could help compile the list. I think you should decide on this soon and annouce it very publicly.
David, your last-but-one paragraph makes no sense. :-)
It makes perfect sense to me. :-) For others, it might help if you change "known other" to "other known" and remove the "not" in the second line. What I was saying is that since there are all sorts of quirky doctypes out there, but anything that's new should probably be standard mode, we might want to do the recognition based on: * quirks if no doctype or an old doctype * standard if new/unknown doctype but this would require a really good list of "old" doctypes since there are some weird ones floating around.
That's what I thought you meant. I agree (as usual...). Rick: Which method do you wish to use? If you wish to use the "all known old, invalid and missing doctypes => quirks, everything else => standard" idea, which I would recommend, I suggest you say so relatively quickly so that we can start fielding doctypes from the web.
Assigning bug to myself.
Hooked up parser mode to document DTD mode. Here is a gist: FPIs mapped to STRICT mode are: "-//W3C//DTD HTML 4.0//EN" "-//W3C//DTD HTML 4.01//EN" "-//W3C//DTD HTML 4.0x//EN" (x=>Any number) - Any comments ? "-//W3C//DTD HTML 4.0 UNKNOWN//EN" (UNKNOWN could be NOQUIRKS,...etc.,) Other known FPIs are mapped to QUIRKS mode.
To what are *unknown* FPIs mapped?
If HTML 4.0 is not found in the DOCTYPE string then a unknown FPI would be mapped to quirks. i.e., "-//W3C//DTD STANDARD//EN" -> This would be mapped to quirks!!
You should map the ISO doctypes to strict too, as I mentioned above.
ISO doctypes are hooked up too :) The following FPIs are also mapped to strict DTD: "ISO/IEC 15445:1999//DTD HyperText Markup Language//EN" "ISO/IEC 15445:1999//DTD HTML//EN" Marking bug FIXED.
Does the presence of any of: * an XML declaration "<?xml version="1.0"?>" * an XHTML DTD trigger strict mode? They should, because XHTML can be sent as text/html.
I've written a quick script to test this: http://www.bath.ac.uk/%7Epy8ieh/cgi/compat-test.pl It doesn't do XML or alternative mime types yet. If anyone can think of any things that are affected by compat vs std mode, other than CSS table inheritance, then please send them to me and I'll add them to the script's output.
Something else that triggers NavQuirks is the string "Transitional" in the FPI, if its an HTML4 FPI. So the following: "-//W3C//DTD HTML 4.0 UKNOWN Transitional UNKNOWN//EN" ...triggers Quirks Mode. However, on non-transitional FPIs, the string NOQUIRKS always triggers standard mode, so the following will be in Standard mode: ";sal hasl;dgh sadFG NOQUIRKS sdg;jaadhf ljkerhyt " Seems ok to me. David?
Created attachment 1878 [details] HTML snippet that shows which mode you're in (change the doctype to test...)
* an XML declaration "<?xml version="1.0"?>" * an XHTML DTD does not trigger strict mode yet. I'm using the  build on a Windows NT 4.0 (Service Pack 5) system. reopening.
Clearing FIXED resolution due to reopen of this bug.
All XML pages should ALWAYS be in standard mode. There are no quirks to mimic there.
I believe the issue here is pages sent as text/html (technically HTML, not XML) that contain an XML declaration or an XHTML doctype. I agree that these should be in standard mode. That is, the XHTML doctypes should be recognized for standard mode and any HTML page that begins with an XML declaration in accordance with the XML spec (must it be on the first line, or something, or can comments be before it????) should also be in standard mode.
Should HTML 4.01 Transitional or HTML 4.0x Transitional trigger standard mode? I think they should, because these are "new" doctypes.
Actually, yeah, TRANSITIONAL should always trigger STRICT mode. A good reason for this is that the CSS1 Test Suite uses the Transitional DTD... :-)
Hooked up DOCTYPES ( all of 'em..I hope :) ) Marking FIXED.
HTML 4.0x Frameset still triggers quirks mode.
...As do any "transitional" DTDs. IMHO all HTML4.x DTDs should trigger standard mode, including Transitional and Frameset. Reopening.
I disagree, perhaps, since some authoring tools may be generating files with these DTDs.
I think this is the best: HTML 4.0 Transitional and Frameset: trigger quirks mode. HTML 4.0 Strict: triggers standard mode. HTML 4.01 and 4.0x: always trigger standard mode.
Marking bug FIXED.
Mode detection still does not work correctly in some cases. see the http://homepage1.nifty.com/emk/moz/dtd.html Reopening.
Clearing FIXED resolution due to reopen.
Moving to M12 since M11 is over and this has been reopened.
This may not be the best place this, but ... On the topic of "other" DTDs (FPI?) I noticed this one: <!DOCTYPE HTML PUBLIC "-//SoftQuad Software//DTD HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0//EN" "hmpro6.dtd"> in a file. I don't know if this is HTML 4.0 STRICT, but I would guess it's pretty tight, since Softquad began as an SGML company.
Moving to m14.
If the idea is to phase out Quirks, we're coming at this backwards. Instead of defaulting to Quirks and establishing conditions for Strict to kick in, we should default to Strict and establish conditions for Quirks. The problem with the former approach (defaulting to Quirks) is that the list of conditions for Strict will always be too short and prone to obsolescence as new document types come into use. So the default should be Strict, with the following conditions triggering Quirks: * the document has no doctype (most HTML). * the document has a doctype in wide use at the time of beta 1. A catalog needs to be made. I expect that it will have no more than 20 doctypes (2.0, 3.2, IETF flavors, etc). * HTML 4.0 Strict should not be in the Quirks catalog, even if it is in wide use. * HTML 4.0 Transitional should trigger Quirks if the declaration contains no URL, and Strict if it does. This tends to characterize the difference between bogus and trustworthy usage, but is obviously not watertight. Composer omits the URL. The CSS1 Test Suite uses it. This is a compromise, of
Todd: I agree. (BTW, your comment was cut short again.) Harish: What do you think?
I strongly agree. Note that I also proposed the idea on 09/01/99 11:48 (above). I think I feel more strongly about it now.
I don't know why my posts get clipped. It's always been only the last word, btw. I didn't read carefully enough. Yes, David proposed this first. I will begin gathering doctypes for the quirks catalog. This is a bogus sentence; perhaps it will be clipped; perhaps not.
I started a list of doctypes at: http://www.people.fas.harvard.edu/~dbaron/tests/nglayout/doctypes.html Any comments? (I put the potentially controversial items in each list first.) I got the list of DTDs from the catalogs of the two known validators.
Looks good. I would suggest adding the (techically invalid?) FPI "CNavDTD" to the list of FPIs that should enable quirks mode. This would be an explicit "enable the Compatability Navigator-DTD parsing mode" FPI. I also suggest that the line which reads: o Any "DOCTYPE HTML SYSTEM" as opposed to "DOCTYPE HTML PUBLIC" ...should also include: o Any 'DOCTYPE HTML PUBLIC "..." SYSTEM "..."'. i.e., when _both_ an FPI and a URI are given, we should trigger standard mode, regardless of the FPI. David, you can probably express that better than me. ;-) BTW, David, once that document becomes more stable, e-mail it to me and I'll make each line link to the relevant test case generated by my script.
Ian - I don't think what you said about SYSTEM makes sense. At least, judging from XML, the syntax for the external subset of the DTD comes in two forms: PUBLIC PubidLiteral SystemLiteral or SYSTEM SystemLiteral So what I'm saying is that, when it takes the first form, we should ignore the SystemLiteral (since there are lots of variants, like REC-html40 vs REC-html40-9712?? vs REC-html40-9804?? vs html4 vs html40 vs html401 vs 1999/REC-html401-9912?? (not to mention the WDs and PRs) in the filename) and base quirks mode on the PubidLiteral. When it takes the second form, we should assume strict mode. (Should we also assume strict mode if there exists an internal subset?) See: http://www.w3.org/TR/REC-xml#NT-doctypedecl http://www.w3.org/TR/REC-xml#NT-ExternalID However, I wish I had the SGML syntax handy...
SGML allows these syntaxes (from memory, 'my' copy of Goldfarb is in my room): 1: PUBLIC PubidLiteral 2: PUBLIC PubidLiteral SystemLiteral 3: SYSTEM SystemLiteral With case 1, we should use the PubidLiteral to decide Quirks mode. With case 2, we should IMHO _always_ use standard mode. With case 3, we should always use standard mode. The reasoning behind case 2 (which my last comment was trying to make, although I unfortunately got the syntax wrong) is that the CSS1 test suite uses this syntax (IIRC). In XML mode, we should _always_ assume standard mode. In HTML mode, I don't believe internal subsets would have any effect, so we should probably ignore them and not worry about them affecting standard/quirk mode selection.
(2) can't trigger strict mode in general - I think it's nearly as common as (1), since it's the syntax recommended by the HTML specs. That's why the CSSTS uses it. Internal subsets are certainly an SGML feature, since XML is a subset of SGML. However, they're very rarely used, and they could be a harmless way of making an old doctype cause strict mode. (Although I guess there should really be a META NAME="mozilla-mode" ... or something...)
If we don't use standard mode for (2), then we will show bugs in the CSS1TS even though they are there only for compatability. The CSS1TS uses doctypes in the form: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> I am not convinced that that DOCTYPE is all that common. Are you sure there are many legacy pages that use that DOCTYPE? Could you list some high profile ones?
I think there are a good number of pages that use it. I'm not sure this is entirely about high profile pages. (Could you list some high profile pages that use DOCTYPEs at all?) I searched around, and found the following pages with DOCTYPEs. They are marked (1) for no SystemLiteral, and (2) for SystemLiteral, and (3) for DOCTYPE HTML SYSTEM: (2) http://www.useit.com/ (2) http://www.w3.org/ (3) http://www.w3.org/Style/ (2) http://www.w3.org/Style/CSS/ (1) http://www.w3.org/MathML/ (1) http://www.w3.org/MarkUp/ (2) http://www.w3.org/TR/ (1) http://www.emacs.org/ (2) http://www.verso.com/ (which uses a custom DTD with a PublicID, which should perhaps be added to my list...) (2) http://style.verso.com/ (1) http://msdn.microsoft.com/default.asp (1) http://www.microsoft.com/unix/ie/default.asp (2) http://www.opera.com (1) http://www.kernelnotes.org/ (1) http://www.kernel.org (1) http://www.linux.org (1) http://sunsite.unc.edu/LDP/docs.html There are slightly more (1) than (2), but there are still quite a few (2). I don't think we should worry about the CSS1 test suite. The test suite may well change in response to Mozilla and MacIE5. Adding css1 keyword because this bug affects conformance to css1.
Nominating for beta1 because I think the first beta should have something that we think will be the eventual solution, so we can get feedback on it. The eventual solution will probably need a lot of fine-tuning, so it would be good to get feedback from the beta. This is a *very* important issue on which to get feedback, because it determines many parts of the behavior of the entire layout engine.
Note: I've loosened up the rules a bit. As of (my next checkin) ANY doctype that reads "HTML 4.xx" AND "transitional" will now render in quirks mode. I can't see any other satisfactory answer.
Assigning to rickg since he has a fix in hand :)
Here's the final call. For a navigator product, backward compatibility with an eye toward the future is essential -- and so the goal is NOT to phase out quirks mode inasmuch as it means "be compatible with extent content on the web". So by default, we're going to behave in quirks mode unless the DTD instructs us to do otherwise. The lastest update causes HTML 4.xx transitional to be quirks. The other list of quirks are cited in this bug. Strict mode will be enabled for XML documents (obviously), XHTML and dtd's with the STRICT keyword.
I must reopen this bug because of a simple mistake rather than a compatibility problem. ISO doctypes are not hooked up correctly. We need a space charactor between "ISO/IEC" and "15445:1999" since StripWhiteSpace() is no longer called.
The current heuristics are as follows: 1. Use QUIRKS mode for any document matching: 1. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... TRANSITIONAL ... 2. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... FRAMESET ... 3. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... LATIN1 ... 4. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... SYMBOLS ... 5. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... SPECIAL ... 2. Failing those, use STRICT mode for documents matching: 1. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... 2. ... <!DOCTYPE ... -//W3C//DTD ... XHTML ... TRANSITIONAL ... 3. ... <!DOCTYPE ... -//W3C//DTD ... XHTML ... STRICT ... 4. ... <!DOCTYPE ... -//W3C//DTD ... XHTML ... FRAMESET ... 5. ... <!DOCTYPE ... ISO/IEC 15445:1999 ... 6. ... ?XML ... 7. ... NOQUIRKS ... 3. Failing those, use OTHER mode if the "PARSE_MODE" environment variable matches "other" 4. Failing that, use QUIRKS mode. I have a few concerns about the code at the moment... 1. Why are not _all_ XHTML doctypes accepted as STRICT? 2. What about HTML 5, should such a thing ever come out? 3. 495 PRInt32 theEnd=theBuffer.FindChar(kGreaterThan,theIndex+1); What happens if it doesn't find the ">"? 4. Are the following lines not completely redundant?: 518 theSubIndex=theBuffer.Find("HTML",PR_TRUE,theSubIndex+18); 519 if(kNotFound==theSubIndex) 520 theSubIndex=theBuffer.Find ("HYPERTEXTMARKUPLANGUAGE",PR_TRUE,theSubIndex+18); 5. David thinks the following should trigger STRICT mode: 1. Any "DOCTYPE HTML SYSTEM" as opposed to "DOCTYPE HTML PUBLIC" 2. A DOCTYPE declaration without a DTD, i.e., <!DOCTYPE HTML>. 3. A DOCTYPE declaration with an internal subset At the moment they are all Quirks mode. I don't mind either way, but what do you think, David?
ian: here are the answers to your questions: 0. your tests proved that 4.0 without any specifier was erroneously STRICT 1. All XHTML should be STRICT. 2. We'll have to rev more than the mode detection code if HTML5 is ever released 3. I've correct the bug where we don't find '>' 4. That code has been eliminated 5. pages without DOCTYPE absolutely CANNOT be dealt with as strict. That would mean that the *vast* majority of pages on the web today.
Rick - some responses: 0) What do you mean? "DTD HTML 4.0" is part of the FPI for HTML 4.0 strict. The word "Strict" does not appear in the FPI. 5) Nobody was proposing that.
Rick: > 0. your tests proved that 4.0 without any specifier was erroneously STRICT Like David said, we need this for the Strict DTD. > 1. All XHTML should be STRICT. Agreed. This means that the following: if((theBuffer.Find("TRANSITIONAL",PR_TRUE,theSubIndex)>kNotFound)|| (theBuffer.Find("STRICT",PR_TRUE,theSubIndex) >kNotFound) || (theBuffer.Find("FRAMESET",PR_TRUE,theSubIndex) >kNotFound)) result=eParseMode_noquirks; else result=eParseMode_quirks; ...can be changed to simply: result=eParseMode_noquirks; > 2. We'll have to rev more than the mode detection code if HTML5 is ever > released Not necessarily, because HTML 5 should be backwards compatible. The point is at the moment we actually _break_ if someone uses the hypothetical HTML5 DTD, so we are not forwards-compatible _at_all_. Having said that, I have no idea how we could check for HTML 5, 6, 7, 8... DTDs in a way which would not also catch some of the quirky DTDs listed on David's page: http://www.people.fas.harvard.edu/~dbaron/tests/nglayout/doctypes.html > 3. I've correct the bug where we don't find '>' > 4. That code has been eliminated Cool. > 5. pages without DOCTYPE absolutely CANNOT be dealt with as strict. > That would mean that the *vast* majority of pages on the web today. Agreed. However, I was referring to: 1. Any "DOCTYPE HTML SYSTEM" as opposed to "DOCTYPE HTML PUBLIC" 2. A DOCTYPE declaration without a DTD, i.e., <!DOCTYPE HTML>. 3. A DOCTYPE declaration with an internal subset Personally I do not see the point of using Standard mode with those as opposed to quirk mode. Number 1 may well occur on legacy documents and is not recommended by the HTML specs. Number 2 is more likely to mean HTML2 than any other version. And Number 3 is probably too complicated since one can just use a strict FPI to get that effect, and that is simpler. David? 6. The problem reported by VYV03354 is still an issue of course. (See above) BTW, if anyone is wondering which function we are talking about, see: http://lxr.mozilla.org/seamonkey/ident?i=DetermineParseMode
Policy aside, which at some point becomes a judgement call, I've addressed all the remaining issues that Ian has raised (in my last checkin). If further issues arise, let's start anew.
removing "fixed" and adding "crash" to keywords
dang it, wrong bug. Returning to verified/fixed.
(Noted for cross-reference) Bug 31933 has some details on why this configuration makes it impossible to create Transitional documents that work according to W3C specs.