Closed Bug 166642 Opened 23 years ago Closed 16 years ago

parsing of SGML prolog is horribly broken

Categories

(Core :: DOM: HTML Parser, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: l.savernik, Unassigned)

References

Details

(Keywords: html4, testcase, Whiteboard: [HTML4-4.2])

Attachments

(2 files)

When an sgml comment is used inside the !DOCTYPE element which contains a tag, the !DOCTYPE element is incorrectly closed. Consider the following snippet for clarification: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd" -- tests the features of <div> --> will make Mozilla close the DOCTYPE at the <div> which is meant to be part of a comment and display the closing --> as contents of the div which is clearly wrong. Otherwise, if markup appears inside any other sgml comments, it will be correctly ignored and not rendered. Btw, I'm only concerned about strict mode here, the above behaviour could considered to be "correct" for quirks mode. For more information see the attached testcase which is fully compliant. Generally I think Mozilla should be able to handle all compliant documents correctly, this is my understanding of "strict mode". Setting OS to all as I could reproduce it under Linux as well as under WinXP.
promised attachment
This testcase places a > directly after the first -- making Mozilla believe the DOCTYPE's finished and consequently output the text of the comment. However, the testcase still perfectly validates as HTML 4.01 strict. Here is a link for validating the previous attachment for those don't believe it: http://validator.w3.org/check?uri=http://bugzilla.mozilla.org/attachment.cgi?id=97797&action=view Parse tree as from validator.w3.org: <HTML> <HEAD> AHTTP-EQUIV TOKEN CONTENT-TYPE ACONTENT CDATA text/html; charset=ISO-8859-1 <META> </META> <TITLE> SGML comments Tests </TITLE> </HEAD> <BODY> <DIV> However, SGML comments are supported </DIV> </BODY> </HTML> Parse tree as from the DOM inspector: html head meta title #text #text body #text (text of comment in fact) div (misinterpreted div) #text ("-->") div (real div) #text #comment #text
Our parsing of document prolog is pretty much completely non-functional: after consuming the parts of the DOCTYPE, it just churns along until it finds ">". (Parsing of internal subsets is broken, too.)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: html4, testcase
Summary: sgml comments inside DOCTYPE not parsed correctly → parsing of SGML prolog is horribly broken
Whiteboard: [HTML4-4.2]
*** Bug 178848 has been marked as a duplicate of this bug. ***
Taking. This is probably related to the view-source bug 98149. The problem is that we don't really parse these SGML tags as SGML. The main problem is that tokens cannot create new tokens in the tokenizer. I'll try to fix both of these bugs at the same time (which may be in a while, depending on some other things).
Assignee: harishd → mrbkap
Assignee: mrbkap → nobody
QA Contact: moied → parser
Tracking HTML5 for parsing. Not SGML.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: