34476 - system identifier in doctype should trigger standard mode

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Description

•

25 years ago

DESCRIPTION: A system identifier in a DOCTYPE should trigger strict mode. MacIE5 has modes like mozilla, and it uses the presence of a SystemID to cause strict mode (at least in the form <!DOCTYPE RootElem PUBLIC PublicID SystemID>). Since MacIE5 is already released, most pages where this causes any problems should be changed in the near future. See http://www.deja.com/[ST_rn=ps]/getdoc.xp?AN=604424748&fmt=text for a quick description of MacIE's algorithm. I think any doctype that has a systemID in the form: <!DOCTYPE HTML PUBLIC PublicID SystemID> <!DOCTYPE HTML SYSTEM SystemID> or has an internal subset: <!DOCTYPE HTML (PUBLIC PublicID SystemID? | SYSTEM SystemID) [ Internal-SS ]> should trigger strict mode (the latter two cases are very rare, and the first is the one I'm sure MacIE does). This would leave only DOCTYPEs of the form <!DOCTYPE HTML PUBLIC PublicID> to be treated with the current algorithm. (These are the vast majority of DOCTYPEs on the web.) For information on the syntax of DOCTYPEs in XML (which is a subset of SGML, of which HTML is an application), see: http://www.w3.org/TR/REC-xml#dt-doctype http://www.w3.org/TR/REC-xml#NT-ExternalID Note that SGML does not require the SystemID for PUBLIC doctypes. I think Ian may be able to provide steps to reproduce more easily than I can...

rickg

Assignee

Comment 1

•

25 years ago

David: please explain why systemID should be treated as strict.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 2

•

25 years ago

A SystemID should lead to strict mode because: * it's an easy way for page authors to control strict mode without affecting validity * it's what MacIE does, and therefore it shouldn't cause too many problems (since the pages where it causes problems will see them with MacIE first)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Updated

•

25 years ago

Blocks: 34662

rickg

Assignee

Updated

•

25 years ago

Target Milestone: --- → M17

rickg

Assignee

Comment 3

•

25 years ago

Fixed in my tree

Status: NEW → ASSIGNED

rickg

Assignee

Comment 4

•

25 years ago

Landed fixes. Read code in nsParser.cpp to learn more.

Status: ASSIGNED → RESOLVED

Closed: 25 years ago

Resolution: --- → FIXED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 5

•

25 years ago

Reopening. Looking at code in CParserContext.cpp, you're looking for the strings "PublicID" and "SystemID", not the concepts. I might have a chance to write a patch for what I meant in C++ this weekend (hopefully), so we can avoid the difficulties of translating C++ to English and back to C++ again. What I'd like to do is just: 1) check for a proper XML declaration (if so, strict) 2) parse the DOCTYPE based on the SGML spec (and be quirks if it doesn't fit the spec or if there isn't one at all) 3) check for the presence of a SystemID or an internal subset (strict if so) 4) then do the logic on the PublicID

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Masatoshi Kimura [:emk]

Comment 6

•

25 years ago

I think current mode determination is too complicate. And I think three (or more) modes are also too complicate. They will only confuse web designers. (Help! Which mode we are in?) We had better to follow the MacIE5 strategy. If they have system identifier, XML PI, XHTML DTD, or ISO HTML DTD, trigger the strict mode. Otherwise, trigger the quirks mode. That's simple and reasonable. (I will accept that HTML4 Strict without system identifier trigger the quirks.)

rickg

Assignee

Comment 7

•

25 years ago

Modes in the parsing engine are independent of modes the users will see. To users, there will be 3 modes: STRICT, non-strict and quirks. Independent of that, there's html3, html4, xml and xhtml. That's the world we're in. David -- I'd like to talk to you offline about ID's. I accept that my algorithm is a hack regarding these -- but I haven't done enough research to determine the right thing to do. I suspect you know, and can help me get it right. I'm closing this bug -- and I'll open a new one regarding ID's.

Status: REOPENED → RESOLVED

Closed: 25 years ago → 25 years ago

Resolution: --- → FIXED

Masatoshi Kimura [:emk]

Comment 8

•

25 years ago

Reopening since I can't find new bug regarding ID's (If there is one, please tell me a bug id). System identifier check code still doesn't work at all. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> - both handled in quirks. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> - both handled in strict. If this behaviour is your intent, the following code is redundant. 269 //one last thing: look for a URI that specifies the strict.dtd 270 theStartPos+=6; 271 theCount=theEnd-theStartPos; 272 theSubIndex=theBuffer.Find("STRICT.DTD",PR_TRUE,theStartPos,theCount); 273 if(0<theSubIndex) { 274 //Since we found it, regardless of what's in the descr-text, kick into strict mode. 275 mParseMode=eParseMode_strict; 276 mDocType=eHTML4Text; 277 } If this behaviour is not your intent, it should be fixed. Also, the following code should be removed since <!DOCTYPE HTML PUBLIC PublicID SystemID> is NOT a real doctype. 299 else { 300 PRInt32 thePos=theBuffer.Find("HTML",PR_TRUE,1,50); 301 if(kNotFound!=thePos) { 302 mDocType=eHTML4Text; 303 PRInt32 theIDPos=theBuffer.Find("PublicID",thePos); 304 if(kNotFound==theIDPos) 305 theIDPos=theBuffer.Find("SystemID",thePos); 306 mParseMode=(kNotFound==theIDPos) ? eParseMode_quirks : eParseMode_strict; 307 } 308 }

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Masatoshi Kimura [:emk]

Comment 9

•

25 years ago

The future HTML may become include only strict doctypes and their URI may not include the string "strict.dtd". For example, XHTML 1.1 does no longer include the transitional doctypes and their URI do not have the string "strict.dtd". Of course, this is not a problem since XHTML documents always trigger the strict. But the point is HTML 5.0 (or later) may become so. My suggestion is (and possibly David's one is) that doctypes with URI always trigger the strict. Web authors will be encouraged to use the horrible hack if we do not have the way to use strict with Transitional DTD. That is, they will use the strict doctypes only for trigger the strict, but actual document body is transitional.

Masatoshi Kimura [:emk]

Comment 10

•

25 years ago

The following doctype does not solve the problem since this is invalid. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/strict.dtd"> W3C defines only the following form: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> And we can omit URI per SGML spec: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"> But the public identifier and the URI must match. It is strange that valid doctypes trigger the quirks when invalid doctypes trigger the strict.

Hixie (not reading bugmail)

Comment 11

•

25 years ago

(aside: HTML 5 == XHTML 1. It is alomst certain that there will not be any more SGML-based versions of the HTML language.)

rickg

Assignee

Comment 12

•

25 years ago

Let's clear some of this up. We only have 2 options at this time for HTML (or XHTML served as text/html): compatible-mode and strict-mode. Transitional documents really should be handled as a variant of strict, but the strict-mode system isn't ready to do that just yet. So anything that is loose or transitional is handled in compatible mode. The suggestion that we treat all documents with a URI as strict is patently wrong and will not be implemented. Other DTD notes: This is treated as strict: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> This is treated as compatible (because we don't have transitional): <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> This is treated as compatible: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> I don't see any other problems in this bug, so I'm closing it. Please open a new bug (as this one is getting difficult to follow) for new problems.

Status: REOPENED → RESOLVED

Closed: 25 years ago → 25 years ago

Resolution: --- → INVALID

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Reporter

Comment 13

•

25 years ago

Hmm.. I thought the idea of 3 modes (where the parser would be lenient in the middle one, but layout would be strict) was a good idea, and the intent of this bug was that a systemID would trigger the middle mode. I have this fully implemented in the DOCTYPE handling code that I've written, except whatever deals with the modes doesn't handle returning eDTDMode_Standard (or whatever it is) correctly, so it just acts like quirks. I think we should allow authors a way to trigger strict mode in *layout* without having to use the strict DTD. The three-mode solution, along with fixing this bug, is a good way to do this.

Summary: system identifier in doctype should trigger strict mode → system identifier in doctype should trigger standard mode

Scott A. Colcord

Comment 14

•

25 years ago

rickg: How does this relate to your fix for bug 29417? There, you indicated that you had a more intelligent DOCTYPE based detection mechanism, but that it was not ready to be enabled by default. Should that bug still be considered FIXED? Also, in bug 34135, you indicated that you had a mechanism for controlling layout based on a META tag. Is that mechanism ready to become "official"? Finally, it'd be nice to have a bit more explanation for why treating all documents with a URI as strict (for purposes of layout) is patently wrong. I've not seen a justification for that decision, at least in this bug. From the points made here, it seems to be as clean a solution as could be hoped for without a lot of work. If this bug does remain INVALID, what is the correct issue for addressing the problem of writing a page conforming to the Transitional DTD which needs to be laid out in non-quirks mode? I suspect there are far more Transitional documents being written than Strict ones (I know this to be true of the companies I've worked with), and that this situation will persist for some time yet. It seems like a workaround is called for, if a real solution is not feasible given the time constraints.

bsharma

Comment 15

•

24 years ago

updated qa contact.

QA Contact: janc → bsharma

Moied

Updated

•

24 years ago

QA Contact: bsharma → moied

Bugzilla

system identifier in doctype should trigger standard mode

Categories

(Core :: DOM: HTML Parser, defect, P3)

Tracking

()

People

(Reporter: dbaron, Assigned: rickg)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated