Closed Bug 102127 Opened 23 years ago Closed 23 years ago

Mozilla fails to display conformant HTML

Categories

(Core :: DOM: HTML Parser, defect)

Sun
Solaris
defect
Not set
normal

Tracking

()

VERIFIED INVALID

People

(Reporter: Mitch, Assigned: harishd)

References

Details

(Whiteboard: [technote])

Attachments

(3 files)

Save the following code snippet verbatim to a file:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
             
<!----------------------------
Another comment               
------------------------------> 
                               
You should see this text, but Mozilla cannot see this ?

</html>

Now try and load it via Mozilla. It fails to display the text you should
see. Netscape 4.7.x works fine. Explorer works fine. Mozilla fails to display.

Please refer to the page

http://www.htmlhelp.com/reference/wilbur/misc/comment.html

for HTML comment syntax which says that <!xxxx> is a VALID comment.

This should be considered a BUG.
Attached file original testcase
Created an attachment with the original test case
(http://bugzilla.mozilla.org/attachment.cgi?id=51207&action=view)
Confirmed with w2k 0.9.4+ 2001092308.
Works with IE 5.01.
This is not valid html.

From http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.4 : 
"A common error is to include a string of hyphens ("---") within a comment.
Authors should avoid putting two or more adjacent hyphens inside comments."

Thus, the comment is your example is not a valid comment
->invalid
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
reopening, typo in cc and going back to page changed invalid to fixed
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
right one this time
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → INVALID
Sorry i disagree. 

http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.4

says 

"Authors should avoid putting two or more adjacent hyphens inside comments."

There is only ONE hyphen in this comment. My testcase is not invalid.


Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
Ah ok, i see you meant the second comment.
Cancel
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
I am reopening as for the following reasons.

1. The URL http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.4
   statest Authors should "avoid" putting 2 or more hyphens in a comment.
   However it doesn't state that violates the HTML standard.

2. This breaks a lot of web pages out in the world and we cannot fix all
   of them

3. It works with all other browsers, including Netscape 4.x, Internet Explorer,
   Netscape 6 (which is based on Mozilla Milestone 18) and will severly limit
   the use of Mozilla in the real world

4. It is a regression from Mozilla builds (as per #3 above)

5. Is should be a trivial fix.

6. The behavior is inconsistent even if you quote #1 above. I attach another
   code snippet and IT WORKS even if you have 2 or more hyphens in the comment.


Please fix this.
Status: RESOLVED → UNCONFIRMED
Resolution: FIXED → ---
Furthermore when you view the page source to either of the attachments
both render it in green as a comment block thereby implying that it accepts
that this is a comment block, but one renders the HTML properly and the other
doesn't. There seems to be a mismatch between the rendering engine and the
parsing engine.
parser?
Assignee: asa → harishd
Component: Browser-General → Parser
QA Contact: doronr → moied
I'm assuming (maybe incorrectly) that there was a parser which parsed the
server provided html and then did a handoff to the render engine so each
individual component could do its job - e.g. HTML would be rendered by the
html engine, GIF/JPG would be rendered by the imaging engine, etc... Maybe
a different paridgm is used but i assume the design must be similar, hence
the behavior i am seeing with the "Page Source" showing a different understanding
of the code it is being given, and the browser rendering a different version.

<!-- Comment start
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
-- COMMENT END!! THIS IS NOT REALLY IN A COMMENT
Another comment               
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
-- COMMENT START > THIS IS STILL COMMENTED
Please read the spec, or search for bugs where bz explains this.
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → INVALID
I fail to see your point.

That code works fine. I.e. everything is treated as a comment
between "<!" and ">" as it should and i see "THIS IS STILL COMMENTED"
being displayed. 

What exactly are you saying ?

Basically the facts are:

1. Behavior is inconsistent in same mozilla release on treating of
   number of hypens in a comment.
2. A seemingly arbitrary number of hypens will make or break the 
   parsing of comments.
3. It is not consistent between Mozilla releases. It is a regression
   from Mozilla milestone 18 and Mozilla 0.9.x

If the behavior is consistent then we have a leg to stand on in order
that we can say that the page has incorrect html. However this is not
the case. Depending on an arbitrary number of hyphens we will render or
not ! This is ludicrous.

Please tell me why this is invalid code. And point me to the "invalid spec"
you mention ?

Status: RESOLVED → UNCONFIRMED
Resolution: INVALID → ---
OK. Here is the situation:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">

puts mozilla in strict mode.  In strict mode we do strict comment parsing.  The
strict syntax for comments is as follows:

"--" starts a comment.  "--" ends a comment.  Both of these only inside an SGML
markup declaration.  "<!" tells the parser to start such a declaration.  ">"
tells the parser to end such a declaration.

The reason HTML tells you not to have "--" inside a comment is that it will
_end_the_comment_.  However, "----" will end a comment and start a new one. 
This is why you see inconsistent behavior...

Your original testcase, with text in [] to denote what's a comment and what's
not:

<!--[comment]--[not comment]--[comment]--[not comment]--[comment]--
 [not comment]--[comment]--[not comment]--[comment]--[not comment]--
 [comment]--[not comment]--[comment]--
 [not comment, but inside SGML markup, so not shown]Another comment--
 [comment]--[not comment]--[comment]--[not comment]--[comment]--
 [not comment]--[comment]--[not comment]--[comment]--[not comment]--
 [comment]--[not comment]--[comment]--[not comment]--
 [comment, including the ">"]> 

So the ">" is _commented_out_.  Thus the SGML markup declaration never ends and
all the rest of your document is treated as SGML markup (and promptly ignored).

Please read that htmlhelp.com page again.  Then count the number of dashes they
have in their examples carefully.  That page presents everything correctly.  In
particular, note the paragraph starting "Not all HTML parsers get this right.". 
We _do_ get it right, and just as "hello" in that example is commented out, the
text in your testcase is commented out.

Now to respond to your numbered list of statements:

1.  The HTML spec is not the normative spec on comments.  All it says is that
    the syntax is SGML comment syntax (that's why you have to start an SGML
    markup section to use comments).  There is unfortunately no free electronic
    version of the SGML spec available.  Again. the htmlhelp.com site you
    mention describes this topic well and correctly.

2.  This only breaks pages claiming a strict doctype and thus strict conformance
    with the HTML spec.  Pages with no doctype or doctypes declaring HTML
    versions 3.2 or lower, 4.0 transitional, or 4.01 transitional without dtd
    uri get backwards-compatible (read: broken) parsing

3.  What you're saying is that some bugs in Mozilla got fixed and bugs in other
    browsers are not fixed yet.  Have you tried IE6 on that page, by any chance?

4.  See my answer to #3

5.  This does not need fixing

6.  This is your misunderstanding of the spec.  Please read the htmlhelp site
    again.

Oh, and view page source has a known bug that causes it to do broken comment
parsing even for strict pages.  See bug 91045 (it's marked duplicate of a
wide-ranging parsing cleanup for view source).

I'm marking this invalid.  Please feel free to reopen if you can convince me,
after reading that htmlhelp.com page 3 or 4 times, that this is incorrect.
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → INVALID
From Goldfarb, "The SGML Handbook" (contains the full text of the SGML standard,
ISO 8879):
comment declaration = mdo, (comment, (s|comment)*)?, mdc
comment = com, SGML character*, com

Explanation: this is the formal expression of SGML comment syntax.
For the purposes of HTML, mdo (markup declaration open) is <! and mdc (markup
declaration close) is > and com is --.  s is "separator characters", i.e.,
whitespace.  The * indicates that the preceding token may occur 0 or more times,
the | that one and only one of the tokens it separates may occur, and the ? that
the preceding token may occur 0 or 1 time.  Hence, <!xxxx> is a legitimate
comment *declaration*, because it *does not actually contain a comment*, which
is optional.  From the prose of ISO 8879, again: "No markup is recognized in a
comment, other than the com delimiter that terminates it."  As bz pointed out,
because the intended mdc character ">" is enclosed within a comment, *it is not
recognized as markup*, and hence is not closed.

Verifying INVALID.
Status: RESOLVED → VERIFIED
Bug 120385 has some good examples too for a technote.
Whiteboard: [technote]
*** Bug 215395 has been marked as a duplicate of this bug. ***
*** Bug 278726 has been marked as a duplicate of this bug. ***
*** Bug 281611 has been marked as a duplicate of this bug. ***
*** Bug 282487 has been marked as a duplicate of this bug. ***
*** Bug 292574 has been marked as a duplicate of this bug. ***
*** Bug 297895 has been marked as a duplicate of this bug. ***
*** Bug 305669 has been marked as a duplicate of this bug. ***
*** Bug 342494 has been marked as a duplicate of this bug. ***
It seems that this bug is the cause of rendering issues with some Ikonboard-based boards: http://gens.consolemul.com/cgi-bin/ikonboard/ikonboard.cgi?act=ST;f=6;t=347;st=30 contains a seriously screwed thread page... If you look on the HTML for this page, you will see things like this:

---
<!--QuoteBegin--blindpainkiller+Jan. 07 2006,10:03--><table border="0" align="center" width="95%" cellpadding="0" cellspacing="0"><tr><td><b>Quote</b> (blindpainkiller @ Jan. 07 2006,10:03)</td></tr><tr><td id="QUOTE"><!--QuoteEBegin--><font color='#000000'>About the SMS Sonic 2 level select:<br>I remember back in the day when I used a SMS converter on my Megadrive and I couldn&#39;t get this cheat working no matter how much I tried, but on the real console it worked fine. I&#39;ve tried it many times on Fusion too, but can&#39;t get it working <!--emo&:(--><img src="http://gens.consolemul.com/iB_html/non-cgi/emoticons/sad-smiley-056.gif" border="0" valign="absmiddle" alt=':('><!--endemo-->. I think however that I some years ago heard about someone who got it working in Meka, but I&#39;m not sure.</font><!--QuoteEnd--></td></tr></table><!--QuoteEEnd--><br><font color='#000000'>I&#39;ve got it working in Meka and it wasn&#39;t years ago that I reported it, a couple of months ago at best (maybe someone else reported it years ago), and the cheat always worked in Meka as far as I can remember.<br>It didn&#39;t work on a MD with the SMS converter? That&#39;s strange.</font>  <!--Signature--><br><br>--------------<br>
---

The page is rendered fine on IE7, but FF interprets the post test as a big comment, screwing up the page. This happens in some newer Ikonboard versions (this one is 3.1.5), but not in old versions. The comment that causes this mess is this one: <!--QuoteBegin--blindpainkiller+Jan. 07 2006,10:03-->. Notice the two extra dashes...
Any domain name with consecutive hyphens breaks comments. Ex:
     <!-- <a href="http://ex--ample.com/">Commented Out</a> -->
"Commented Out -->" is incorrectly rendered on the page.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: