Closed Bug 1312 Opened 21 years ago Closed 20 years ago

"Standard" compatibility mode needs to be hooked to DOCTYPE

Categories

(Core :: HTML: Parser, defect, P2)

x86
Windows 95
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: angus, Assigned: rickg)

References

()

Details

(Keywords: css1, Whiteboard: (py8ieh:snarf test cases for eviltests)[PDT-] (Compatability mode detection))

Attachments

(3 files)

A DOCTYPE of HTML 4/5/etc. should make compatiblity mode work. Currently the
only way to enable compatibility mode is in debug builds through a menu item.

Also, for cases where changing the DOCTYPE isn't appropriate, we should support
a "META" tag for accomplishing the same thing.
Status: NEW → ASSIGNED
*** Bug 1562 has been marked as a duplicate of this bug. ***
Setting all current Open/Normal to M4.
per leger, assigning QA contacts to all open bugs without QA contacts according
to list at http://bugzilla.mozilla.org/describecomponents.cgi?product=Browser
QA Contact: 3847 → 4141
reassigning qacontact to gem (HTML Parser)
*** Bug 2072 has been marked as a duplicate of this bug. ***
Note. This is currently making QA of standards compliance issues difficult
(eg, verification of bug 2749 is awaiting doctype-controlled compat mode).

Also, a DOCTYPE of HTML 4/5/etc. should make _standard_ mode work. It is an
HTML _3_ DOCTYPE that should enable compatibility mode.
Target Milestone: M8 → M10
Blocks: 2749
This is currently marked M10, but knowing how bugs slip milestones, I'll say
that I don't think a beta should go out without this, since if it's introduced
later, you might get more people complaining you've broken their HTML 4 pages by
not having quirks mode, while it worked in previous versions.  This should
really be P1 I think, since the longer you delay the harder it will be to
introduce.
This should include the new HTML 4.01 DOCTYPE (assuming the spec gets past
the proposed recommendation stage):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
            "http://www.w3.org/TR/html40/strict.dtd">

See http://www.w3.org/MarkUp/#news for details.
Depends on: 8780
No longer depends on: 8780
Blocks: 8780
The following FPIs should definitely trigger strict mode:

In http://woodworm.cs.uml.edu/~rprice/15445/15445.html
"ISO/IEC 15445:1999//DTD HyperText Markup Language//EN"
"ISO/IEC 15445:1999//DTD HTML//EN"

In http://www.w3.org/TR/xhtml1/
"-//W3C//DTD XHTML 1.0 Strict//EN"
"-//W3C//DTD XHTML 1.0 Transitional//EN"
"-//W3C//DTD XHTML 1.0 Frameset//EN"

In http://www.w3.org/TR/html40
"-//W3C//DTD HTML 4.01//EN"

And probably in http://www.w3.org/TR/REC-html40/
"-//W3C//DTD HTML 4.0//EN"

How will this be future compatible?  Should you, instead, compile a list of
known other doctypes and recognize any doctype not on that list (or none at all)
as quirks, so that all new doctypes will be standard mode?  If you want to do
this, I could help compile the list.

I think you should decide on this soon and annouce it very publicly.
David, your last-but-one paragraph makes no sense. :-)
It makes perfect sense to me.  :-)   For others, it might help if you change
"known other" to "other known" and remove the "not" in the second line.

What I was saying is that since there are all sorts of quirky doctypes out
there, but anything that's new should probably be standard mode, we might want
to do the recognition based on:
 * quirks if no doctype or an old doctype
 * standard if new/unknown doctype
but this would require a really good list of "old" doctypes since there are some
weird ones floating around.
That's what I thought you meant. I agree (as usual...).

Rick: Which method do you wish to use? If you wish to use the "all known old,
invalid and missing doctypes => quirks, everything else => standard" idea,
which I would recommend, I suggest you say so relatively quickly so that we can
start fielding doctypes from the web.
Assignee: rickg → harishd
Status: ASSIGNED → NEW
Assigning bug to myself.
Status: NEW → ASSIGNED
Priority: P2 → P1
Target Milestone: M10 → M11
Hooked up parser mode to document DTD mode.

Here is a gist:
 FPIs mapped to STRICT mode are:

  "-//W3C//DTD HTML 4.0//EN"
  "-//W3C//DTD HTML 4.01//EN"
  "-//W3C//DTD HTML 4.0x//EN" (x=>Any number) - Any comments ?
  "-//W3C//DTD HTML 4.0 UNKNOWN//EN" (UNKNOWN could be NOQUIRKS,...etc.,)

Other known FPIs are mapped to QUIRKS mode.
To what are *unknown* FPIs mapped?
If HTML 4.0 is not found in the DOCTYPE string then a unknown FPI would be
mapped to quirks.

i.e., "-//W3C//DTD STANDARD//EN" -> This would be mapped to quirks!!
You should map the ISO doctypes to strict too, as I mentioned above.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
ISO doctypes are hooked up too :)

The following FPIs are also mapped to strict DTD:

"ISO/IEC 15445:1999//DTD HyperText Markup Language//EN"
"ISO/IEC 15445:1999//DTD HTML//EN"

Marking bug FIXED.
Does the presence of any of:
 * an XML declaration "<?xml version="1.0"?>"
 * an XHTML DTD
trigger strict mode?  They should, because XHTML can be sent as text/html.
I've written a quick script to test this:
   http://www.bath.ac.uk/%7Epy8ieh/cgi/compat-test.pl
It doesn't do XML or alternative mime types yet.

If anyone can think of any things that are affected by compat vs std mode, other
than CSS table inheritance, then please send them to me and I'll add them to the
script's output.
Something else that triggers NavQuirks is the string "Transitional" in the FPI,
if its an HTML4 FPI. So the following:
   "-//W3C//DTD HTML 4.0 UKNOWN Transitional UNKNOWN//EN"
...triggers Quirks Mode. However, on non-transitional FPIs, the string NOQUIRKS
always triggers standard mode, so the following will be in Standard mode:
   ";sal hasl;dgh sadFG NOQUIRKS sdg;jaadhf ljkerhyt "
Seems ok to me. David?
Status: RESOLVED → REOPENED
* an XML declaration "<?xml version="1.0"?>"
 * an XHTML DTD
does not trigger strict mode yet.
I'm using the [1999092808] build on a Windows NT 4.0 (Service Pack 5) system.
reopening.
Resolution: FIXED → ---
Clearing FIXED resolution due to reopen of this bug.
All XML pages should ALWAYS be in standard mode. There are no quirks to mimic
there.
I believe the issue here is pages sent as text/html (technically HTML, not XML)
that contain an XML declaration or an XHTML doctype.  I agree that these should
be in standard mode.  That is, the XHTML doctypes should be recognized for
standard mode and any HTML page that begins with an XML declaration in
accordance with the XML spec (must it be on the first line, or something, or can
comments be before it????) should also be in standard mode.
Should HTML 4.01 Transitional or HTML 4.0x Transitional trigger standard mode?
I think they should, because these are "new" doctypes.
Actually, yeah, TRANSITIONAL should always trigger STRICT mode. A good reason
for this is that the CSS1 Test Suite uses the Transitional DTD... :-)
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Hooked up DOCTYPES ( all of 'em..I hope :) )

Marking FIXED.
HTML 4.0x Frameset still triggers quirks mode.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
...As do any "transitional" DTDs.
IMHO all HTML4.x DTDs should trigger standard mode, including Transitional
and Frameset.

Reopening.
I disagree, perhaps, since some authoring tools may be generating files with
these DTDs.
I think this is the best:
HTML 4.0 Transitional and Frameset: trigger quirks mode.
HTML 4.0 Strict: triggers standard mode.
HTML 4.01 and 4.0x: always trigger standard mode.
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Marking bug FIXED.
Status: RESOLVED → REOPENED
Mode detection still does not work correctly in some cases.
see the http://homepage1.nifty.com/emk/moz/dtd.html
Reopening.
Resolution: FIXED → ---
Clearing FIXED resolution due to reopen.
Target Milestone: M11 → M12
Moving to M12 since M11 is over and this has been reopened.
QA Contact: gem → janc
This may not be the best place this, but ... On the topic of "other" DTDs (FPI?)
I noticed this one: <!DOCTYPE HTML PUBLIC "-//SoftQuad Software//DTD HoTMetaL
PRO 6.0::19990601::extensions to HTML 4.0//EN" "hmpro6.dtd"> in a file.

I don't know if this is HTML 4.0 STRICT, but I would guess it's pretty tight,
since Softquad began as an SGML company.
Target Milestone: M13 → M14
Moving to m14.
Status: REOPENED → ASSIGNED
Priority: P1 → P5
Whiteboard: (py8ieh:snarf test case for eviltests)
If the idea is to phase out Quirks, we're coming at this backwards. Instead of defaulting to Quirks
and establishing conditions for Strict to kick in, we should default to Strict and establish
conditions for Quirks. The problem with the former approach (defaulting to Quirks) is that the list
of conditions for Strict will always be too short and prone to obsolescence as new document
types come into use.

So the default should be Strict, with the following conditions triggering Quirks:
* the document has no doctype (most HTML).
* the document has a doctype in wide use at the time of beta 1. A catalog needs to be made. I
expect that it will have no more than 20 doctypes (2.0, 3.2, IETF flavors, etc).
* HTML 4.0 Strict should not be in the Quirks catalog, even if it is in wide use.
* HTML 4.0 Transitional should trigger Quirks if the declaration contains no URL, and Strict if it
does. This tends to characterize the difference between bogus and trustworthy usage, but is
obviously not watertight. Composer omits the URL. The CSS1 Test Suite uses it. This is a
compromise, of
Todd: I agree. (BTW, your comment was cut short again.)
Harish: What do you think?
I strongly agree.  Note that I also proposed the idea on 09/01/99 11:48
(above).  I think I feel more strongly about it now.
I don't know why my posts get clipped. It's always been only the last word, btw.



I didn't read carefully enough. Yes, David proposed this first.



I will begin gathering doctypes for the quirks catalog.



This is a bogus sentence; perhaps it will be clipped; perhaps not.
I started a list of doctypes at:

http://www.people.fas.harvard.edu/~dbaron/tests/nglayout/doctypes.html

Any comments?  (I put the potentially controversial items in each list first.) 
I got the list of DTDs from the catalogs of the two known validators.
Looks good.

I would suggest adding the (techically invalid?) FPI "CNavDTD" to the list of 
FPIs that should enable quirks mode. This would be an explicit "enable the 
Compatability Navigator-DTD parsing mode" FPI.

I also suggest that the line which reads:
   o Any "DOCTYPE HTML SYSTEM" as opposed to "DOCTYPE HTML PUBLIC" 
...should also include:
   o Any 'DOCTYPE HTML PUBLIC "..." SYSTEM "..."'.
i.e., when _both_ an FPI and a URI are given, we should trigger standard mode,
regardless of the FPI.

David, you can probably express that better than me. ;-)

BTW, David, once that document becomes more stable, e-mail it to me and I'll
make each line link to the relevant test case generated by my script.
Whiteboard: (py8ieh:snarf test case for eviltests) → (py8ieh:snarf test cases for eviltests)
Ian - I don't think what you said about SYSTEM makes sense.  At least, judging
from XML, the syntax for the external subset of the DTD comes in two forms:

PUBLIC PubidLiteral SystemLiteral
or
SYSTEM SystemLiteral

So what I'm saying is that, when it takes the first form, we should ignore the
SystemLiteral (since there are lots of variants, like REC-html40 vs
REC-html40-9712?? vs REC-html40-9804?? vs html4 vs html40 vs html401 vs
1999/REC-html401-9912?? (not to mention the WDs and PRs) in the filename) and
base quirks mode on the PubidLiteral.  When it takes the second form, we should
assume strict mode.  (Should we also assume strict mode if there exists an
internal subset?)

See:
http://www.w3.org/TR/REC-xml#NT-doctypedecl
http://www.w3.org/TR/REC-xml#NT-ExternalID

However, I wish I had the SGML syntax handy...
SGML allows these syntaxes (from memory, 'my' copy of Goldfarb is in my room):

   1: PUBLIC PubidLiteral 
   2: PUBLIC PubidLiteral SystemLiteral
   3: SYSTEM SystemLiteral

With case 1, we should use the PubidLiteral to decide Quirks mode.
With case 2, we should IMHO _always_ use standard mode.
With case 3, we should always use standard mode.

The reasoning behind case 2 (which my last comment was trying to make, although
I unfortunately got the syntax wrong) is that the CSS1 test suite uses this 
syntax (IIRC).

In XML mode, we should _always_ assume standard mode. In HTML mode, I don't
believe internal subsets would have any effect, so we should probably ignore
them and not worry about them affecting standard/quirk mode selection.
(2) can't trigger strict mode in general - I think it's nearly as common as (1),
since it's the syntax recommended by the HTML specs.  That's why the CSSTS uses
it.

Internal subsets are certainly an SGML feature, since XML is a subset of SGML. 
However, they're very rarely used, and they could be a harmless way of making an
old doctype cause strict mode.  (Although I guess there should really be a META
NAME="mozilla-mode" ... or something...)
If we don't use standard mode for (2), then we will show bugs in the CSS1TS 
even though they are there only for compatability. The CSS1TS uses doctypes
in the form:

   <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
   "http://www.w3.org/TR/REC-html40/loose.dtd">

I am not convinced that that DOCTYPE is all that common. Are you sure there are
many legacy pages that use that DOCTYPE? Could you list some high profile ones?
I think there are a good number of pages that use it.  I'm not sure this is
entirely about high profile pages.  (Could you list some high profile pages that
use DOCTYPEs at all?)  I searched around, and found the following pages with
DOCTYPEs.  They are marked (1) for no SystemLiteral, and (2) for SystemLiteral,
and (3) for DOCTYPE HTML SYSTEM:

(2) http://www.useit.com/
(2) http://www.w3.org/
(3) http://www.w3.org/Style/
(2) http://www.w3.org/Style/CSS/
(1) http://www.w3.org/MathML/
(1) http://www.w3.org/MarkUp/
(2) http://www.w3.org/TR/
(1) http://www.emacs.org/
(2) http://www.verso.com/ (which uses a custom DTD with a PublicID, which should
perhaps be added to my list...)
(2) http://style.verso.com/
(1) http://msdn.microsoft.com/default.asp
(1) http://www.microsoft.com/unix/ie/default.asp
(2) http://www.opera.com
(1) http://www.kernelnotes.org/
(1) http://www.kernel.org
(1) http://www.linux.org
(1) http://sunsite.unc.edu/LDP/docs.html

There are slightly more (1) than (2), but there are still quite a few (2).  I
don't think we should worry about the CSS1 test suite.  The test suite may well
change in response to Mozilla and MacIE5.

Adding css1 keyword because this bug affects conformance to css1.
Keywords: css1
Priority: P5 → P2
Nominating for beta1 because I think the first beta should have something that
we think will be the eventual solution, so we can get feedback on it.  The
eventual solution will probably need a lot of fine-tuning, so it would be good
to get feedback from the beta.  This is a *very* important issue on which to get
feedback, because it determines many parts of the behavior of the entire layout
engine.
Keywords: beta1
Note: I've loosened up the rules a bit. As of (my next checkin) ANY doctype that 
reads "HTML 4.xx" AND "transitional" will now render in quirks mode. I can't see 
any other satisfactory answer.
Whiteboard: (py8ieh:snarf test cases for eviltests) → (py8ieh:snarf test cases for eviltests)[PDT-]
Assigning to rickg since he has a fix in hand :)
Assignee: harishd → rickg
Status: ASSIGNED → NEW
Here's the final call. For a navigator product, backward compatibility with an 
eye toward the future is essential -- and so the goal is NOT to phase out quirks 
mode inasmuch as it means "be compatible with extent content on the web". So by 
default, we're going to behave in quirks mode unless the DTD instructs us to do 
otherwise.

The lastest update causes HTML 4.xx transitional to be quirks. The other list of 
quirks are cited in this bug. Strict mode will be enabled for XML documents 
(obviously), XHTML and dtd's with the STRICT keyword. 
Status: NEW → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
I must reopen this bug because of a simple mistake rather than a compatibility 
problem.
ISO doctypes are not hooked up correctly.
We need a space charactor between "ISO/IEC" and "15445:1999" since 
StripWhiteSpace() is no longer called.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The current heuristics are as follows:

1. Use QUIRKS mode for any document matching:
   1. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... TRANSITIONAL ...
   2. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... FRAMESET ...
   3. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... LATIN1 ...
   4. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... SYMBOLS ...
   5. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... SPECIAL ...

2. Failing those, use STRICT mode for documents matching:
   1. ... <!DOCTYPE ... -//W3C//DTD ... HTML 4 ... 
   2. ... <!DOCTYPE ... -//W3C//DTD ... XHTML ... TRANSITIONAL ...
   3. ... <!DOCTYPE ... -//W3C//DTD ... XHTML ... STRICT ...
   4. ... <!DOCTYPE ... -//W3C//DTD ... XHTML ... FRAMESET ...
   5. ... <!DOCTYPE ... ISO/IEC 15445:1999 ...
   6. ... ?XML ...
   7. ... NOQUIRKS ...

3. Failing those, use OTHER mode if the "PARSE_MODE" environment
   variable matches "other"

4. Failing that, use QUIRKS mode.


I have a few concerns about the code at the moment...

1. Why are not _all_ XHTML doctypes accepted as STRICT?

2. What about HTML 5, should such a thing ever come out?

3. 495   PRInt32 theEnd=theBuffer.FindChar(kGreaterThan,theIndex+1);
   What happens if it doesn't find the ">"?

4. Are the following lines not completely redundant?:
   518   theSubIndex=theBuffer.Find("HTML",PR_TRUE,theSubIndex+18);
   519   if(kNotFound==theSubIndex)
   520     theSubIndex=theBuffer.Find
("HYPERTEXTMARKUPLANGUAGE",PR_TRUE,theSubIndex+18);

5. David thinks the following should trigger STRICT mode:
   1. Any "DOCTYPE HTML SYSTEM" as opposed to "DOCTYPE HTML PUBLIC" 
   2. A DOCTYPE declaration without a DTD, i.e., <!DOCTYPE HTML>. 
   3. A DOCTYPE declaration with an internal subset 
   At the moment they are all Quirks mode. I don't mind either way,
   but what do you think, David?
ian: here are the answers to your questions:

0. your tests proved that 4.0 without any specifier was erroneously STRICT
1. All XHTML should be STRICT.
2. We'll have to rev more than the mode detection code if HTML5 is ever released
3. I've correct the bug where we don't find '>'
4. That code has been eliminated
5. pages without DOCTYPE absolutely CANNOT be dealt with as strict. That would 
mean that the *vast* majority of pages on the web today. 
Rick - some responses:

 0) What do you mean?   "DTD HTML 4.0" is part of the FPI for HTML 4.0 strict. 
The word "Strict" does not appear in the FPI.

 5) Nobody was proposing that.
Rick:
> 0. your tests proved that 4.0 without any specifier was erroneously STRICT
Like David said, we need this for the Strict DTD.

> 1. All XHTML should be STRICT.
Agreed. This means that the following:

 if((theBuffer.Find("TRANSITIONAL",PR_TRUE,theSubIndex)>kNotFound)||
    (theBuffer.Find("STRICT",PR_TRUE,theSubIndex)   >kNotFound)   ||
    (theBuffer.Find("FRAMESET",PR_TRUE,theSubIndex) >kNotFound))
   result=eParseMode_noquirks;
 else
   result=eParseMode_quirks;

...can be changed to simply:

 result=eParseMode_noquirks;

> 2. We'll have to rev more than the mode detection code if HTML5 is ever
>    released
Not necessarily, because HTML 5 should be backwards compatible. The point is
at the moment we actually _break_ if someone uses the hypothetical HTML5 DTD,
so we are not forwards-compatible _at_all_.

Having said that, I have no idea how we could check for HTML 5, 6, 7, 8... DTDs
in a way which would not also catch some of the quirky DTDs listed on David's 
page: http://www.people.fas.harvard.edu/~dbaron/tests/nglayout/doctypes.html

> 3. I've correct the bug where we don't find '>'
> 4. That code has been eliminated
Cool.

> 5. pages without DOCTYPE absolutely CANNOT be dealt with as strict.
>    That would mean that the *vast* majority of pages on the web today. 
Agreed.

However, I was referring to:
   1. Any "DOCTYPE HTML SYSTEM" as opposed to "DOCTYPE HTML PUBLIC" 
   2. A DOCTYPE declaration without a DTD, i.e., <!DOCTYPE HTML>. 
   3. A DOCTYPE declaration with an internal subset 
Personally I do not see the point of using Standard mode with those as opposed
to quirk mode. Number 1 may well occur on legacy documents and is not 
recommended by the HTML specs. Number 2 is more likely to mean HTML2 than any
other version. And Number 3 is probably too complicated since one can just use
a strict FPI to get that effect, and that is simpler. David?

6. The problem reported by VYV03354 is still an issue of course. (See above)

BTW, if anyone is wondering which function we are talking about, see:
   http://lxr.mozilla.org/seamonkey/ident?i=DetermineParseMode
Blocks: html4.01
Policy aside, which at some point becomes a judgement call, I've addressed all 
the remaining issues that Ian has raised (in my last checkin).

If further issues arise, let's start anew.
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Keywords: verifyme
removing "fixed" and adding "crash" to keywords
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
dang it, wrong bug.  Returning to verified/fixed.
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
.
Status: RESOLVED → VERIFIED
(Noted for cross-reference) Bug 31933 has some details on why this 
configuration makes it impossible to create Transitional documents that work 
according to W3C specs.
Whiteboard: (py8ieh:snarf test cases for eviltests)[PDT-] → (py8ieh:snarf test cases for eviltests)[PDT-] (Compatability mode detection)
Blocks: 34662
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.