Last Comment Bug 102127 - Mozilla fails to display conformant HTML
: Mozilla fails to display conformant HTML
Status: VERIFIED INVALID
[technote]
:
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: Sun Solaris
: -- normal (vote)
: ---
Assigned To: harishd
: Moied
: Andrew Overholt [:overholt]
Mentors:
: 215395 278726 281611 282487 292574 297895 305669 342494 469101 508003 522977 543064 584000 602679 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2001-09-28 04:17 PDT by Mitch
Modified: 2013-07-15 05:38 PDT (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
original testcase (221 bytes, text/html)
2001-09-28 04:49 PDT, Jerome Lacoste
no flags Details
New test case with more than 2 hyphens in a comment (163 bytes, text/html)
2001-09-28 06:35 PDT, Mitch
no flags Details
Domain name closes HTML comments. (421 bytes, text/html)
2010-02-20 20:15 PST, Karen
no flags Details

Description Mitch 2001-09-28 04:17:11 PDT
Save the following code snippet verbatim to a file:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
<html>
             
<!----------------------------
Another comment               
------------------------------> 
                               
You should see this text, but Mozilla cannot see this ?

</html>

Now try and load it via Mozilla. It fails to display the text you should
see. Netscape 4.7.x works fine. Explorer works fine. Mozilla fails to display.

Please refer to the page

http://www.htmlhelp.com/reference/wilbur/misc/comment.html

for HTML comment syntax which says that <!xxxx> is a VALID comment.

This should be considered a BUG.
Comment 1 Jerome Lacoste 2001-09-28 04:49:41 PDT
Created attachment 51207 [details]
original testcase
Comment 2 Jerome Lacoste 2001-09-28 04:50:57 PDT
Created an attachment with the original test case
(http://bugzilla.mozilla.org/attachment.cgi?id=51207&action=view)
Confirmed with w2k 0.9.4+ 2001092308.
Works with IE 5.01.
Comment 3 Gilles Durys 2001-09-28 06:00:38 PDT
This is not valid html.

From http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.4 : 
"A common error is to include a string of hyphens ("---") within a comment.
Authors should avoid putting two or more adjacent hyphens inside comments."

Thus, the comment is your example is not a valid comment
->invalid
Comment 4 Gilles Durys 2001-09-28 06:01:37 PDT
reopening, typo in cc and going back to page changed invalid to fixed
Comment 5 Gilles Durys 2001-09-28 06:02:05 PDT
right one this time
Comment 6 Mitch 2001-09-28 06:05:19 PDT
Sorry i disagree. 

http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.4

says 

"Authors should avoid putting two or more adjacent hyphens inside comments."

There is only ONE hyphen in this comment. My testcase is not invalid.


Comment 7 Mitch 2001-09-28 06:09:06 PDT
Ah ok, i see you meant the second comment.
Comment 8 Mitch 2001-09-28 06:09:37 PDT
Cancel
Comment 9 Mitch 2001-09-28 06:34:11 PDT
I am reopening as for the following reasons.

1. The URL http://www.w3.org/TR/html401/intro/sgmltut.html#h-3.2.4
   statest Authors should "avoid" putting 2 or more hyphens in a comment.
   However it doesn't state that violates the HTML standard.

2. This breaks a lot of web pages out in the world and we cannot fix all
   of them

3. It works with all other browsers, including Netscape 4.x, Internet Explorer,
   Netscape 6 (which is based on Mozilla Milestone 18) and will severly limit
   the use of Mozilla in the real world

4. It is a regression from Mozilla builds (as per #3 above)

5. Is should be a trivial fix.

6. The behavior is inconsistent even if you quote #1 above. I attach another
   code snippet and IT WORKS even if you have 2 or more hyphens in the comment.


Please fix this.
Comment 10 Mitch 2001-09-28 06:35:52 PDT
Created attachment 51212 [details]
New test case with more than 2 hyphens in a comment
Comment 11 Mitch 2001-09-28 07:09:36 PDT
Furthermore when you view the page source to either of the attachments
both render it in green as a comment block thereby implying that it accepts
that this is a comment block, but one renders the HTML properly and the other
doesn't. There seems to be a mismatch between the rendering engine and the
parsing engine.
Comment 12 Doron Rosenberg (IBM) 2001-09-28 07:27:46 PDT
parser?
Comment 13 Mitch 2001-09-28 07:44:38 PDT
I'm assuming (maybe incorrectly) that there was a parser which parsed the
server provided html and then did a handoff to the render engine so each
individual component could do its job - e.g. HTML would be rendered by the
html engine, GIF/JPG would be rendered by the imaging engine, etc... Maybe
a different paridgm is used but i assume the design must be similar, hence
the behavior i am seeing with the "Page Source" showing a different understanding
of the code it is being given, and the browser rendering a different version.

Comment 14 timeless 2001-09-28 08:31:02 PDT
<!-- Comment start
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
-- COMMENT END!! THIS IS NOT REALLY IN A COMMENT
Another comment               
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
---- NOOP
-- COMMENT START > THIS IS STILL COMMENTED
Please read the spec, or search for bugs where bz explains this.
Comment 15 Mitch 2001-09-28 08:57:54 PDT
I fail to see your point.

That code works fine. I.e. everything is treated as a comment
between "<!" and ">" as it should and i see "THIS IS STILL COMMENTED"
being displayed. 

What exactly are you saying ?

Basically the facts are:

1. Behavior is inconsistent in same mozilla release on treating of
   number of hypens in a comment.
2. A seemingly arbitrary number of hypens will make or break the 
   parsing of comments.
3. It is not consistent between Mozilla releases. It is a regression
   from Mozilla milestone 18 and Mozilla 0.9.x

If the behavior is consistent then we have a leg to stand on in order
that we can say that the page has incorrect html. However this is not
the case. Depending on an arbitrary number of hyphens we will render or
not ! This is ludicrous.

Please tell me why this is invalid code. And point me to the "invalid spec"
you mention ?

Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2001-09-28 09:33:37 PDT
OK. Here is the situation:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">

puts mozilla in strict mode.  In strict mode we do strict comment parsing.  The
strict syntax for comments is as follows:

"--" starts a comment.  "--" ends a comment.  Both of these only inside an SGML
markup declaration.  "<!" tells the parser to start such a declaration.  ">"
tells the parser to end such a declaration.

The reason HTML tells you not to have "--" inside a comment is that it will
_end_the_comment_.  However, "----" will end a comment and start a new one. 
This is why you see inconsistent behavior...

Your original testcase, with text in [] to denote what's a comment and what's
not:

<!--[comment]--[not comment]--[comment]--[not comment]--[comment]--
 [not comment]--[comment]--[not comment]--[comment]--[not comment]--
 [comment]--[not comment]--[comment]--
 [not comment, but inside SGML markup, so not shown]Another comment--
 [comment]--[not comment]--[comment]--[not comment]--[comment]--
 [not comment]--[comment]--[not comment]--[comment]--[not comment]--
 [comment]--[not comment]--[comment]--[not comment]--
 [comment, including the ">"]> 

So the ">" is _commented_out_.  Thus the SGML markup declaration never ends and
all the rest of your document is treated as SGML markup (and promptly ignored).

Please read that htmlhelp.com page again.  Then count the number of dashes they
have in their examples carefully.  That page presents everything correctly.  In
particular, note the paragraph starting "Not all HTML parsers get this right.". 
We _do_ get it right, and just as "hello" in that example is commented out, the
text in your testcase is commented out.

Now to respond to your numbered list of statements:

1.  The HTML spec is not the normative spec on comments.  All it says is that
    the syntax is SGML comment syntax (that's why you have to start an SGML
    markup section to use comments).  There is unfortunately no free electronic
    version of the SGML spec available.  Again. the htmlhelp.com site you
    mention describes this topic well and correctly.

2.  This only breaks pages claiming a strict doctype and thus strict conformance
    with the HTML spec.  Pages with no doctype or doctypes declaring HTML
    versions 3.2 or lower, 4.0 transitional, or 4.01 transitional without dtd
    uri get backwards-compatible (read: broken) parsing

3.  What you're saying is that some bugs in Mozilla got fixed and bugs in other
    browsers are not fixed yet.  Have you tried IE6 on that page, by any chance?

4.  See my answer to #3

5.  This does not need fixing

6.  This is your misunderstanding of the spec.  Please read the htmlhelp site
    again.

Oh, and view page source has a known bug that causes it to do broken comment
parsing even for strict pages.  See bug 91045 (it's marked duplicate of a
wide-ranging parsing cleanup for view source).

I'm marking this invalid.  Please feel free to reopen if you can convince me,
after reading that htmlhelp.com page 3 or 4 times, that this is incorrect.
Comment 17 Christopher Hoess (gone) 2001-09-28 09:47:03 PDT
From Goldfarb, "The SGML Handbook" (contains the full text of the SGML standard,
ISO 8879):
comment declaration = mdo, (comment, (s|comment)*)?, mdc
comment = com, SGML character*, com

Explanation: this is the formal expression of SGML comment syntax.
For the purposes of HTML, mdo (markup declaration open) is <! and mdc (markup
declaration close) is > and com is --.  s is "separator characters", i.e.,
whitespace.  The * indicates that the preceding token may occur 0 or more times,
the | that one and only one of the tokens it separates may occur, and the ? that
the preceding token may occur 0 or 1 time.  Hence, <!xxxx> is a legitimate
comment *declaration*, because it *does not actually contain a comment*, which
is optional.  From the prose of ISO 8879, again: "No markup is recognized in a
comment, other than the com delimiter that terminates it."  As bz pointed out,
because the intended mdc character ">" is enclosed within a comment, *it is not
recognized as markup*, and hence is not closed.

Verifying INVALID.
Comment 18 Susie Wyshak 2003-05-08 09:03:56 PDT
Bug 120385 has some good examples too for a technote.
Comment 19 Oliver Klee 2003-08-07 06:16:00 PDT
*** Bug 215395 has been marked as a duplicate of this bug. ***
Comment 20 Erik Fabert 2005-01-17 08:44:26 PST
*** Bug 278726 has been marked as a duplicate of this bug. ***
Comment 21 Matthias Versen [:Matti] 2005-02-09 03:45:17 PST
*** Bug 281611 has been marked as a duplicate of this bug. ***
Comment 22 Josh Birnbaum 2005-02-16 22:14:19 PST
*** Bug 282487 has been marked as a duplicate of this bug. ***
Comment 23 Steve England [:stevee] 2005-05-02 04:22:01 PDT
*** Bug 292574 has been marked as a duplicate of this bug. ***
Comment 24 Elmar Ludwig 2005-06-16 04:55:23 PDT
*** Bug 297895 has been marked as a duplicate of this bug. ***
Comment 25 Jo Hermans 2005-08-23 14:06:03 PDT
*** Bug 305669 has been marked as a duplicate of this bug. ***
Comment 26 Phil Ringnalda (:philor) 2006-06-22 21:19:15 PDT
*** Bug 342494 has been marked as a duplicate of this bug. ***
Comment 27 Tom Maneiro 2007-12-28 10:29:12 PST
It seems that this bug is the cause of rendering issues with some Ikonboard-based boards: http://gens.consolemul.com/cgi-bin/ikonboard/ikonboard.cgi?act=ST;f=6;t=347;st=30 contains a seriously screwed thread page... If you look on the HTML for this page, you will see things like this:

---
<!--QuoteBegin--blindpainkiller+Jan. 07 2006,10:03--><table border="0" align="center" width="95%" cellpadding="0" cellspacing="0"><tr><td><b>Quote</b> (blindpainkiller @ Jan. 07 2006,10:03)</td></tr><tr><td id="QUOTE"><!--QuoteEBegin--><font color='#000000'>About the SMS Sonic 2 level select:<br>I remember back in the day when I used a SMS converter on my Megadrive and I couldn&#39;t get this cheat working no matter how much I tried, but on the real console it worked fine. I&#39;ve tried it many times on Fusion too, but can&#39;t get it working <!--emo&:(--><img src="http://gens.consolemul.com/iB_html/non-cgi/emoticons/sad-smiley-056.gif" border="0" valign="absmiddle" alt=':('><!--endemo-->. I think however that I some years ago heard about someone who got it working in Meka, but I&#39;m not sure.</font><!--QuoteEnd--></td></tr></table><!--QuoteEEnd--><br><font color='#000000'>I&#39;ve got it working in Meka and it wasn&#39;t years ago that I reported it, a couple of months ago at best (maybe someone else reported it years ago), and the cheat always worked in Meka as far as I can remember.<br>It didn&#39;t work on a MD with the SMS converter? That&#39;s strange.</font>  <!--Signature--><br><br>--------------<br>
---

The page is rendered fine on IE7, but FF interprets the post test as a big comment, screwing up the page. This happens in some newer Ikonboard versions (this one is 3.1.5), but not in old versions. The comment that causes this mess is this one: <!--QuoteBegin--blindpainkiller+Jan. 07 2006,10:03-->. Notice the two extra dashes...
Comment 28 r.lagrange 2008-12-11 02:51:37 PST
*** Bug 469101 has been marked as a duplicate of this bug. ***
Comment 29 Mardeg 2009-08-03 04:50:06 PDT
*** Bug 508003 has been marked as a duplicate of this bug. ***
Comment 30 Mardeg 2009-10-18 14:06:19 PDT
*** Bug 522977 has been marked as a duplicate of this bug. ***
Comment 31 Jo Hermans 2010-01-29 12:49:27 PST
*** Bug 543064 has been marked as a duplicate of this bug. ***
Comment 32 Karen 2010-02-20 20:15:26 PST
Created attachment 428003 [details]
Domain name closes HTML comments.

Any domain name with consecutive hyphens breaks comments. Ex:
     <!-- <a href="http://ex--ample.com/">Commented Out</a> -->
"Commented Out -->" is incorrectly rendered on the page.
Comment 33 Mardeg 2010-08-03 03:08:38 PDT
*** Bug 584000 has been marked as a duplicate of this bug. ***
Comment 34 Matthias Versen [:Matti] 2010-10-07 18:54:49 PDT
*** Bug 602679 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.