Closed
Bug 34476
Opened 24 years ago
Closed 24 years ago
system identifier in doctype should trigger standard mode
Categories
(Core :: DOM: HTML Parser, defect, P3)
Tracking
()
RESOLVED
INVALID
M17
People
(Reporter: dbaron, Assigned: rickg)
References
Details
DESCRIPTION: A system identifier in a DOCTYPE should trigger strict mode. MacIE5 has modes like mozilla, and it uses the presence of a SystemID to cause strict mode (at least in the form <!DOCTYPE RootElem PUBLIC PublicID SystemID>). Since MacIE5 is already released, most pages where this causes any problems should be changed in the near future. See http://www.deja.com/[ST_rn=ps]/getdoc.xp?AN=604424748&fmt=text for a quick description of MacIE's algorithm. I think any doctype that has a systemID in the form: <!DOCTYPE HTML PUBLIC PublicID SystemID> <!DOCTYPE HTML SYSTEM SystemID> or has an internal subset: <!DOCTYPE HTML (PUBLIC PublicID SystemID? | SYSTEM SystemID) [ Internal-SS ]> should trigger strict mode (the latter two cases are very rare, and the first is the one I'm sure MacIE does). This would leave only DOCTYPEs of the form <!DOCTYPE HTML PUBLIC PublicID> to be treated with the current algorithm. (These are the vast majority of DOCTYPEs on the web.) For information on the syntax of DOCTYPEs in XML (which is a subset of SGML, of which HTML is an application), see: http://www.w3.org/TR/REC-xml#dt-doctype http://www.w3.org/TR/REC-xml#NT-ExternalID Note that SGML does not require the SystemID for PUBLIC doctypes. I think Ian may be able to provide steps to reproduce more easily than I can...
David: please explain why systemID should be treated as strict.
Reporter | ||
Comment 2•24 years ago
|
||
A SystemID should lead to strict mode because: * it's an easy way for page authors to control strict mode without affecting validity * it's what MacIE does, and therefore it shouldn't cause too many problems (since the pages where it causes problems will see them with MacIE first)
Landed fixes. Read code in nsParser.cpp to learn more.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 5•24 years ago
|
||
Reopening. Looking at code in CParserContext.cpp, you're looking for the strings "PublicID" and "SystemID", not the concepts. I might have a chance to write a patch for what I meant in C++ this weekend (hopefully), so we can avoid the difficulties of translating C++ to English and back to C++ again. What I'd like to do is just: 1) check for a proper XML declaration (if so, strict) 2) parse the DOCTYPE based on the SGML spec (and be quirks if it doesn't fit the spec or if there isn't one at all) 3) check for the presence of a SystemID or an internal subset (strict if so) 4) then do the logic on the PublicID
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 6•24 years ago
|
||
I think current mode determination is too complicate. And I think three (or more) modes are also too complicate. They will only confuse web designers. (Help! Which mode we are in?) We had better to follow the MacIE5 strategy. If they have system identifier, XML PI, XHTML DTD, or ISO HTML DTD, trigger the strict mode. Otherwise, trigger the quirks mode. That's simple and reasonable. (I will accept that HTML4 Strict without system identifier trigger the quirks.)
Modes in the parsing engine are independent of modes the users will see. To users, there will be 3 modes: STRICT, non-strict and quirks. Independent of that, there's html3, html4, xml and xhtml. That's the world we're in. David -- I'd like to talk to you offline about ID's. I accept that my algorithm is a hack regarding these -- but I haven't done enough research to determine the right thing to do. I suspect you know, and can help me get it right. I'm closing this bug -- and I'll open a new one regarding ID's.
Status: REOPENED → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → FIXED
Comment 8•24 years ago
|
||
Reopening since I can't find new bug regarding ID's (If there is one, please tell me a bug id). System identifier check code still doesn't work at all. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> - both handled in quirks. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> - both handled in strict. If this behaviour is your intent, the following code is redundant. 269 //one last thing: look for a URI that specifies the strict.dtd 270 theStartPos+=6; 271 theCount=theEnd-theStartPos; 272 theSubIndex=theBuffer.Find("STRICT.DTD",PR_TRUE,theStartPos,theCount); 273 if(0<theSubIndex) { 274 //Since we found it, regardless of what's in the descr-text, kick into strict mode. 275 mParseMode=eParseMode_strict; 276 mDocType=eHTML4Text; 277 } If this behaviour is not your intent, it should be fixed. Also, the following code should be removed since <!DOCTYPE HTML PUBLIC PublicID SystemID> is NOT a real doctype. 299 else { 300 PRInt32 thePos=theBuffer.Find("HTML",PR_TRUE,1,50); 301 if(kNotFound!=thePos) { 302 mDocType=eHTML4Text; 303 PRInt32 theIDPos=theBuffer.Find("PublicID",thePos); 304 if(kNotFound==theIDPos) 305 theIDPos=theBuffer.Find("SystemID",thePos); 306 mParseMode=(kNotFound==theIDPos) ? eParseMode_quirks : eParseMode_strict; 307 } 308 }
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 9•24 years ago
|
||
The future HTML may become include only strict doctypes and their URI may not include the string "strict.dtd". For example, XHTML 1.1 does no longer include the transitional doctypes and their URI do not have the string "strict.dtd". Of course, this is not a problem since XHTML documents always trigger the strict. But the point is HTML 5.0 (or later) may become so. My suggestion is (and possibly David's one is) that doctypes with URI always trigger the strict. Web authors will be encouraged to use the horrible hack if we do not have the way to use strict with Transitional DTD. That is, they will use the strict doctypes only for trigger the strict, but actual document body is transitional.
Comment 10•24 years ago
|
||
The following doctype does not solve the problem since this is invalid. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/strict.dtd"> W3C defines only the following form: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> And we can omit URI per SGML spec: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"> But the public identifier and the URI must match. It is strange that valid doctypes trigger the quirks when invalid doctypes trigger the strict.
Comment 11•24 years ago
|
||
(aside: HTML 5 == XHTML 1. It is alomst certain that there will not be any more SGML-based versions of the HTML language.)
Assignee | ||
Comment 12•24 years ago
|
||
Let's clear some of this up. We only have 2 options at this time for HTML (or XHTML served as text/html): compatible-mode and strict-mode. Transitional documents really should be handled as a variant of strict, but the strict-mode system isn't ready to do that just yet. So anything that is loose or transitional is handled in compatible mode. The suggestion that we treat all documents with a URI as strict is patently wrong and will not be implemented. Other DTD notes: This is treated as strict: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> This is treated as compatible (because we don't have transitional): <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> This is treated as compatible: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> I don't see any other problems in this bug, so I'm closing it. Please open a new bug (as this one is getting difficult to follow) for new problems.
Status: REOPENED → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 13•24 years ago
|
||
Hmm.. I thought the idea of 3 modes (where the parser would be lenient in the middle one, but layout would be strict) was a good idea, and the intent of this bug was that a systemID would trigger the middle mode. I have this fully implemented in the DOCTYPE handling code that I've written, except whatever deals with the modes doesn't handle returning eDTDMode_Standard (or whatever it is) correctly, so it just acts like quirks. I think we should allow authors a way to trigger strict mode in *layout* without having to use the strict DTD. The three-mode solution, along with fixing this bug, is a good way to do this.
Summary: system identifier in doctype should trigger strict mode → system identifier in doctype should trigger standard mode
Comment 14•24 years ago
|
||
rickg: How does this relate to your fix for bug 29417? There, you indicated that you had a more intelligent DOCTYPE based detection mechanism, but that it was not ready to be enabled by default. Should that bug still be considered FIXED? Also, in bug 34135, you indicated that you had a mechanism for controlling layout based on a META tag. Is that mechanism ready to become "official"? Finally, it'd be nice to have a bit more explanation for why treating all documents with a URI as strict (for purposes of layout) is patently wrong. I've not seen a justification for that decision, at least in this bug. From the points made here, it seems to be as clean a solution as could be hoped for without a lot of work. If this bug does remain INVALID, what is the correct issue for addressing the problem of writing a page conforming to the Transitional DTD which needs to be laid out in non-quirks mode? I suspect there are far more Transitional documents being written than Strict ones (I know this to be true of the companies I've worked with), and that this situation will persist for some time yet. It seems like a workaround is called for, if a real solution is not feasible given the time constraints.
You need to log in
before you can comment on or make changes to this bug.
Description
•