Closed
Bug 166642
Opened 23 years ago
Closed 16 years ago
parsing of SGML prolog is horribly broken
Categories
(Core :: DOM: HTML Parser, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: l.savernik, Unassigned)
References
Details
(Keywords: html4, testcase, Whiteboard: [HTML4-4.2])
Attachments
(2 files)
When an sgml comment is used inside the !DOCTYPE element which contains a tag,
the !DOCTYPE element is incorrectly closed.
Consider the following snippet for clarification:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"
--
tests the features of <div>
-->
will make Mozilla close the DOCTYPE at the <div> which is meant to be part of a
comment and display the closing --> as contents of the div which is clearly wrong.
Otherwise, if markup appears inside any other sgml comments, it will be
correctly ignored and not rendered.
Btw, I'm only concerned about strict mode here, the above behaviour could
considered to be "correct" for quirks mode.
For more information see the attached testcase which is fully compliant.
Generally I think Mozilla should be able to handle all compliant documents
correctly, this is my understanding of "strict mode".
Setting OS to all as I could reproduce it under Linux as well as under WinXP.
| Reporter | ||
Comment 1•23 years ago
|
||
promised attachment
| Reporter | ||
Comment 2•23 years ago
|
||
This testcase places a > directly after the first -- making Mozilla believe the
DOCTYPE's finished and consequently output the text of the comment.
However, the testcase still perfectly validates as HTML 4.01 strict.
Here is a link for validating the previous attachment for those don't believe
it:
http://validator.w3.org/check?uri=http://bugzilla.mozilla.org/attachment.cgi?id=97797&action=view
Parse tree as from validator.w3.org:
<HTML>
<HEAD>
AHTTP-EQUIV TOKEN CONTENT-TYPE
ACONTENT CDATA text/html; charset=ISO-8859-1
<META>
</META>
<TITLE>
SGML comments Tests
</TITLE>
</HEAD>
<BODY>
<DIV>
However, SGML comments are supported
</DIV>
</BODY>
</HTML>
Parse tree as from the DOM inspector:
html
head
meta
title
#text
#text
body
#text (text of comment in fact)
div (misinterpreted div)
#text ("-->")
div (real div)
#text
#comment
#text
Comment 3•23 years ago
|
||
Our parsing of document prolog is pretty much completely non-functional: after
consuming the parts of the DOCTYPE, it just churns along until it finds ">".
(Parsing of internal subsets is broken, too.)
Updated•23 years ago
|
Whiteboard: [HTML4-4.2]
Comment 4•22 years ago
|
||
*** Bug 178848 has been marked as a duplicate of this bug. ***
Comment 5•21 years ago
|
||
Taking. This is probably related to the view-source bug 98149. The problem is
that we don't really parse these SGML tags as SGML. The main problem is that
tokens cannot create new tokens in the tokenizer. I'll try to fix both of these
bugs at the same time (which may be in a while, depending on some other things).
Assignee: harishd → mrbkap
Updated•16 years ago
|
Assignee: mrbkap → nobody
QA Contact: moied → parser
Comment 6•16 years ago
|
||
Tracking HTML5 for parsing. Not SGML.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•