Last Comment Bug 214476 - (SGMLComment) Mozilla interprets a -- (two dashes in a row) inside of a comment or an include improperly
(SGMLComment)
: Mozilla interprets a -- (two dashes in a row) inside of a comment or an inclu...
Status: RESOLVED FIXED
[fixed by the HTML5 parser]
: html5
Product: Core
Classification: Components
Component: HTML: Parser (show other bugs)
: Trunk
: All All
: P2 normal with 4 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
http://www.w3.org/html/wg/html5/#markup
: 214475 269104 271854 271860 288610 294614 294796 307747 318411 320933 321472 332516 338810 340975 341443 351419 352664 359014 362663 365844 367371 368539 375803 387165 387941 388137 393426 396793 404691 406694 412761 423893 428286 429914 432082 436298 451190 455664 464452 471787 475800 477200 484036 487773 500110 500887 516666 530451 530476 536769 544091 546436 562921 567745 574594 576990 577858 578947 579613 589433 596142 599619 600298 600508 609542 614518 615498 621209 634536 635220 638387 (view as bug list)
Depends on: html5-parsing
Blocks:
  Show dependency treegraph
 
Reported: 2003-07-30 09:41 PDT by Nicholas A. Wilson
Modified: 2011-03-03 00:53 PST (History)
93 users (show)
dsicore: blocking1.9.1-
dsicore: wanted1.9.1+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Implement HTML5 comments, v1 (17.16 KB, patch)
2008-06-25 09:14 PDT, Blake Kaplan (:mrbkap)
no flags Details | Diff | Splinter Review
Implement HTML5 comments, v2 (19.74 KB, patch)
2008-07-03 09:11 PDT, Blake Kaplan (:mrbkap)
no flags Details | Diff | Splinter Review
Implement HTML5 comments, v2.5 (20.99 KB, patch)
2008-07-04 07:07 PDT, Blake Kaplan (:mrbkap)
no flags Details | Diff | Splinter Review
Implement HTML5 comments, v2.7 (21.52 KB, patch)
2008-07-15 03:43 PDT, Blake Kaplan (:mrbkap)
no flags Details | Diff | Splinter Review

Description Nicholas A. Wilson 2003-07-30 09:41:25 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5a) Gecko/20030626 Mozilla Firebird/0.6
Build Identifier: 

While updating our top level page, we found out that if a double hyphen (--) is
used within a ssi style comment or ssi that it is misinterpreted by mozilla and
the close may not be read properly, leaving parts or all of the page not displayed.
Example:
<!-- V. 3: August 2000 by University Relations; designer--developer: Bennet
George -->
The double dash between designer and developer caused the comment to not close
properly.
For the time being we found that we can avoid it simply by using a single
hyphen. The problem does not occur in opera, msie, or most of the other browsers
we tested with and does comply to XML standards.

I do not remember if we had spaces on either side of the hyphens or not.

Reproducible: Always

Steps to Reproduce:
1.
2.
3.

Actual Results:  
Browser displayed only a small portion of the page

Expected Results:  
Browser should have interpreted the --> as the end of the comment, it did not
Comment 1 Sebastian Biallas 2003-07-30 10:36:06 PDT
See http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4
I guess this bug is INVALID.
Comment 2 Bill Mason 2003-07-30 11:41:09 PDT
*** Bug 214475 has been marked as a duplicate of this bug. ***
Comment 3 Bill Mason 2003-07-30 11:43:24 PDT
This is invalid.  No -- are permitted within a comment.
Comment 4 Boris Zbarsky [:bz] 2003-07-30 11:54:09 PDT
In SGML, "--" starts a comment and "--" ends a comment.  HTML just uses SGML
comments, with "<!" signalling the starts of SGML markup and ">" signalling its
end.  Therefore:

<!-- -- --> text -->

Has two comments; one containing the string " " and one containing the string 
"> text ".

Now if Mozilla is put in quirks mode, we do backwards compatible comment parsing
(read "broken comment parsing just like old browsers").  But it sounds like the
site in question put Mozilla in standards mode.
Comment 5 Boris Zbarsky [:bz] 2003-07-30 11:55:22 PDT
Bill, not the difference between what you said and how comment parsing actually
works (it's subtle, but important: "--" is the comment delimiter, not just "not
allowed inside a comment").
Comment 6 Bill Mason 2003-07-30 12:07:32 PDT
BZ, maybe before you CC me yet again on bugs that I don't want to be on, you
should take time to take task within the W3C HTML group:

"A common error is to include a string of hyphens ("---") within a comment.
Authors should avoid putting two or more adjacent hyphens inside comments."

Stop CCing me to be pedantic when I'm quoting a spec, or I swear I'll just stop
donating time to Mozilla to triage bugs.  I didn't start writing HTML yesterday.
Comment 7 Oliver Klee 2004-11-11 02:19:02 PST
*** Bug 269104 has been marked as a duplicate of this bug. ***
Comment 8 Martijn Wargers [:mwargers] (not working for Mozilla) 2004-11-26 05:21:01 PST
*** Bug 271854 has been marked as a duplicate of this bug. ***
Comment 9 Bill Mason 2004-11-26 08:07:52 PST
*** Bug 271860 has been marked as a duplicate of this bug. ***
Comment 10 Steve England [:stevee] 2005-04-01 03:29:48 PST
*** Bug 288610 has been marked as a duplicate of this bug. ***
Comment 11 Erik Fabert 2005-05-18 03:09:48 PDT
*** Bug 294614 has been marked as a duplicate of this bug. ***
Comment 12 Phil Ringnalda (:philor) 2005-05-19 11:43:15 PDT
*** Bug 294796 has been marked as a duplicate of this bug. ***
Comment 13 Richard Brodie 2005-09-09 11:16:06 PDT
*** Bug 307747 has been marked as a duplicate of this bug. ***
Comment 14 Reed Loden [:reed] (use needinfo?) 2005-11-30 16:44:45 PST
*** Bug 318411 has been marked as a duplicate of this bug. ***
Comment 15 Erik Fabert 2005-12-20 02:36:43 PST
*** Bug 320933 has been marked as a duplicate of this bug. ***
Comment 16 Jo Hermans 2005-12-25 14:08:02 PST
*** Bug 321472 has been marked as a duplicate of this bug. ***
Comment 17 Jo Hermans 2006-04-02 14:39:43 PDT
*** Bug 332516 has been marked as a duplicate of this bug. ***
Comment 18 Jo Hermans 2006-05-22 06:23:03 PDT
*** Bug 338810 has been marked as a duplicate of this bug. ***
Comment 19 Kevin Brosnan 2006-06-09 08:26:08 PDT
*** Bug 340975 has been marked as a duplicate of this bug. ***
Comment 20 Régis Caspar 2006-06-13 16:29:25 PDT
*** Bug 341443 has been marked as a duplicate of this bug. ***
Comment 21 Richard 2006-06-13 18:21:12 PDT
Invalid or not, I'm not sure I understand why Firefox sometimes will render these rogue comments, sometimes will treat them as commenting out entire sections, etc.  Regardless of the number of "-" between the start and end of a comment, shouldn't they always not render and not affect anything else?  With the number of duplicates, obviously lots of people use this technique to visually block off sections of code.
Comment 22 Boris Zbarsky [:bz] 2006-06-13 19:10:19 PDT
> I'm not sure I understand

Please read the whole bug, esp. comment 4.
Comment 23 g0adragon 2006-08-19 15:49:52 PDT
The following perfectly valid web page is made invalid by having two hyphen characters in a row inside an HTML comment.

Enter the following web page code into the HTML validator under "Validate by Direct Input" at the following address:

http://validator.w3.org/


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>HTML Comments Display on Web pages</title>
</head>
<body>

<!--

This entire comment -- will show in web browser

-->

<p>This line is supposed to be first visible text on web page.</p>

<p>The page works perfectly if the hyphen is split with even a space or removed completely.</p>

</body>
</html>

Comment 24 Phil Ringnalda (:philor) 2006-09-05 08:45:40 PDT
*** Bug 351419 has been marked as a duplicate of this bug. ***
Comment 25 Phil Ringnalda (:philor) 2006-09-14 21:43:58 PDT
*** Bug 352664 has been marked as a duplicate of this bug. ***
Comment 26 ski 2006-10-05 11:05:20 PDT
So if I'm a web developer and I need to allow -- inside comments, what am I to do? FFox renders this directly even with transitional mode, which strikes me as the wrong thing. Or is there some browser-specific way to force "quirks mode" ?
Comment 27 Boris Zbarsky [:bz] 2006-10-05 11:24:57 PDT
> So if I'm a web developer and I need to allow -- inside comments

You can't do that in HTML, if the HTML spec is actually followed.

> Or is there some browser-specific way to force "quirks mode" ?

This is well-documented at http://developer.mozilla.org/en/docs/Mozilla%27s_DOCTYPE_sniffing

I should note that that page is linked directly off http://developer.mozilla.org/en/docs/Mozilla's_Quirks_Mode, which is the first Google hit for "mozilla quirks".
Comment 28 Jesse Ruderman 2006-11-01 04:25:32 PST
*** Bug 359014 has been marked as a duplicate of this bug. ***
Comment 29 Maik Riechert 2006-12-03 13:39:26 PST
*** Bug 362663 has been marked as a duplicate of this bug. ***
Comment 30 Phil Ringnalda (:philor) 2007-01-03 14:43:02 PST
*** Bug 365844 has been marked as a duplicate of this bug. ***
Comment 31 Jo Hermans 2007-01-18 13:08:02 PST
*** Bug 367371 has been marked as a duplicate of this bug. ***
Comment 32 Phil Ringnalda (:philor) 2007-01-29 01:12:56 PST
*** Bug 368539 has been marked as a duplicate of this bug. ***
Comment 33 Phil Ringnalda (:philor) 2007-03-29 09:13:20 PDT
*** Bug 375803 has been marked as a duplicate of this bug. ***
Comment 34 Jesse Ruderman 2007-07-06 16:23:56 PDT
*** Bug 387165 has been marked as a duplicate of this bug. ***
Comment 35 Phil Ringnalda (:philor) 2007-07-12 15:32:10 PDT
*** Bug 387941 has been marked as a duplicate of this bug. ***
Comment 36 Régis Caspar 2007-07-14 06:17:34 PDT
*** Bug 388137 has been marked as a duplicate of this bug. ***
Comment 37 Phil Ringnalda (:philor) 2007-08-23 11:52:54 PDT
*** Bug 393426 has been marked as a duplicate of this bug. ***
Comment 38 Phil Ringnalda (:philor) 2007-09-19 17:22:29 PDT
*** Bug 396793 has been marked as a duplicate of this bug. ***
Comment 39 Phil Ringnalda (:philor) 2007-11-20 22:37:24 PST
*** Bug 404691 has been marked as a duplicate of this bug. ***
Comment 40 Jesse Ruderman 2007-12-03 19:09:08 PST
*** Bug 406694 has been marked as a duplicate of this bug. ***
Comment 41 Dave Townsend [:mossop] 2008-01-17 04:28:50 PST
*** Bug 412761 has been marked as a duplicate of this bug. ***
Comment 42 Phil Ringnalda (:philor) 2008-03-19 10:10:21 PDT
*** Bug 423893 has been marked as a duplicate of this bug. ***
Comment 43 Matthias Versen [:Matti] 2008-04-10 03:37:48 PDT
*** Bug 428286 has been marked as a duplicate of this bug. ***
Comment 44 Matthias Versen [:Matti] 2008-04-20 04:31:37 PDT
*** Bug 429914 has been marked as a duplicate of this bug. ***
Comment 45 Anne (:annevk) 2008-04-20 06:51:26 PDT
FWIW, per HTML5 this is a bug in Firefox.
Comment 46 Frank Wein [:mcsmurf] 2008-04-20 11:14:36 PDT
But -- is still not allowed inside a comment, right? Just the error handling is different (http://www.whatwg.org/specs/web-apps/current-work/#bogus)?
Comment 47 Anne (:annevk) 2008-04-20 15:58:51 PDT
Right. (Though you would not end up in the "bogus comment state".) And also, that doesn't make it less of a bug :-)
Comment 48 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-04-24 06:06:29 PDT
Reopening per Anne's comment.
Comment 49 Phil Ringnalda (:philor) 2008-05-03 21:12:56 PDT
*** Bug 432082 has been marked as a duplicate of this bug. ***
Comment 50 Kai Liu 2008-05-29 07:59:06 PDT
*** Bug 436298 has been marked as a duplicate of this bug. ***
Comment 51 Alan O. 2008-06-05 10:33:32 PDT
If this is because -- is not allowed between <!-- and -->, then it sounds like the specification is inadequate. It defies common sense to use something like that for delimiting when you know full well that it's also used for comments and that -- is commonly used in texts. Use a less common sequence of characters for that purpose, ergo <!-- -#- -->. Don't make the job of a web developer more a pain because of ridiculously short-sighted standards.
Comment 52 tarquin 2008-06-23 03:05:57 PDT
SGML comments are ridiculous and problematic. They confuse authors and break pages. HTML 5 recognises this, and no longer requires browsers to parse comments in SGML format. They are now parsed in a way compatible with all browsers except Firefox. Firefox seems to be insisting on hanging on to the SGML comments, even though others realised they are stupid, and protested against their inclusion in Acid 2.

HTML is not SGML, and never has been (even though it was originally supposed to be) - this won't work anywhere, even though it is valid SGML:
http://virtuelvis.com/download/162/evilml.html

SGML comments were removed from Acid 2 because they are stupid. They were removed from HTML 5 because they are stupid. It's time to remove them from Firefox, and stop breaking pages like the one in this report.

For those wondering why some patterns work, and others leave bits on the page:
http://www.howtocreate.co.uk/SGMLComments.html
Comment 53 Dotan Cohen 2008-06-23 04:02:01 PDT
If Tarquin is right and SGML comments were removed from HTML5, then Firefox should not treat documents with a valid HTML5 Doctype and SGML comment deliminators as such. For older documents, or non-valid HTML5 documents, the gotcha-but-correct behaviour should be maintained.
Comment 54 tarquin 2008-06-23 05:35:32 PDT
Note that this is specified in the parsing section of HTML 5 (the language definition tells authors not to include -- inside comments, but the error handling stage of parsing will allow it):
http://www.w3.org/html/wg/html5/#comment3

"U+002D HYPHEN-MINUS (-) 
Parse error. Append a U+002D HYPHEN-MINUS (-) character to the comment token's data. Stay in the comment end state.
...
Anything else 
Parse error. Append two U+002D HYPHEN-MINUS (-) characters and the input character to the comment token's data. Switch to the comment state."
Comment 55 Damon Sicore (:damons) 2008-06-23 17:15:12 PDT
Wouldn't hold the release for this, but I think we should get this on the list for 1.9.1. 

blocking1.9.1-
wanted1.9.1+, P2.  

Anyone want to volunteer here?
Comment 56 Blake Kaplan (:mrbkap) 2008-06-24 01:43:58 PDT
I'll take this.
Comment 57 tarquin 2008-06-24 02:10:31 PDT
"For older documents, or non-valid HTML5 documents, the gotcha-but-correct behaviour should be maintained."

This will not help anyone. The broken pages (including the one in this report) will not get fixed. Firefox will remain incompatible with all other browsers. SGML comments were removed from HTML because they are stupid in all cases, not sometimes-stupid. Having comments that nobody understands in HTML 4 standards mode, but not in quirks mode or HTML 5 mode, is beyond confusing.

They should be removed in all modes in order to be compatible with existing Web Pages, other browsers, and author expectations, while providing a consistent response to all doctypes.
Comment 58 Blake Kaplan (:mrbkap) 2008-06-25 09:14:49 PDT
Created attachment 326714 [details] [diff] [review]
Implement HTML5 comments, v1

If we're going to replace our comment parsing, we might as well implement HTML5. This patch is a straightforward implementation of the part of the state machine that consumes comments. I haven't tested it very thoroughly (in particular, I need to ensure that the behavior is consistent across packet boundaries) but as far as I can tell, it follows the spec word for word.
Comment 59 Damon Sicore (:damons) 2008-06-25 09:57:50 PDT
Blake, what test framework do we use to test something like this?
Comment 60 Boris Zbarsky [:bz] 2008-06-25 11:10:23 PDT
We have parser mochitests.
Comment 61 Blake Kaplan (:mrbkap) 2008-06-25 12:12:00 PDT
So, the parser mochitests work, but require a bunch of manual verification. In particular, my patch doesn't affect where we put the comments in the DOM and all of the interesting test cases in our mochitests hit this problem.
Comment 62 Boris Zbarsky [:bz] 2008-06-25 13:18:39 PDT
Doesn't your patch affect where the comment terminates and therefore what Element nodes end up in the document?
Comment 63 Blake Kaplan (:mrbkap) 2008-06-25 13:21:57 PDT
Sorry, yes. I meant that given the testcase |<!-- comment -->|, our resulting DOM looks like:
HTML
  HEAD
    <!-- comment -->
  BODY

where the tests want
<!-- comment -->
HTML
  HEAD
  BODY

and fixing that seems beyond the scope of this bug (unless people say otherwise).
Comment 64 Boris Zbarsky [:bz] 2008-06-25 18:44:21 PDT
Sure.  I was assuming we'd add the tests from this bug and/or duplicates to parser/htmlparser/tests/mochitest/regressions.txt or some such.  At least that's where I've been adding the parser tests... ;)
Comment 65 Blake Kaplan (:mrbkap) 2008-07-03 09:11:41 PDT
Created attachment 327975 [details] [diff] [review]
Implement HTML5 comments, v2

I made the state machine a little less jumpy and started adding tests. Unfortunately, the tests I've added here all fail. I don't understand how we're serializing these comments.
Comment 66 Blake Kaplan (:mrbkap) 2008-07-04 07:07:53 PDT
Created attachment 328128 [details] [diff] [review]
Implement HTML5 comments, v2.5

This adds a bunch of tests, and I'm pretty sure that I've implemented the spec faithfully. This is ready for review.
Comment 67 Blake Kaplan (:mrbkap) 2008-07-15 03:26:20 PDT
One thing I've noticed is in the testcase: |<title>foo <!-- bar| the resulting content model is:

html
  head
    title
      "foo "
    <!--  bar -->

I'll fix that.
Comment 68 Blake Kaplan (:mrbkap) 2008-07-15 03:43:59 PDT
Created attachment 329653 [details] [diff] [review]
Implement HTML5 comments, v2.7

Here's the interesting part of the interdiff:

diff --git a/parser/htmlparser/src/nsHTMLTokens.cpp b/parser/htmlparser/src/nsHTMLTokens.cpp
--- a/parser/htmlparser/src/nsHTMLTokens.cpp
+++ b/parser/htmlparser/src/nsHTMLTokens.cpp
@@ -900,6 +900,10 @@ CTextToken::ConsumeParsedCharacterData(P
 
         consumer.AppendSourceTo(theContent.writable());
         mNewlineCount += consumer.GetNewlineCount();
+
+        // If we successfully consumed a comment, end the title after the
+        // comment.
+        aScanner.CurrentPosition(altEndPos);
         continue;
       }
     }
Comment 69 Phil Ringnalda (:philor) 2008-08-19 07:36:13 PDT
*** Bug 451190 has been marked as a duplicate of this bug. ***
Comment 70 Jo Hermans 2008-09-17 04:01:45 PDT
*** Bug 455664 has been marked as a duplicate of this bug. ***
Comment 71 Phil Ringnalda (:philor) 2008-11-18 22:17:58 PST
*** Bug 464452 has been marked as a duplicate of this bug. ***
Comment 72 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-12-12 11:59:07 PST
Blake, what is the status of this patch? Is it still good to review?
Comment 73 Blake Kaplan (:mrbkap) 2008-12-12 15:09:59 PST
(In reply to comment #72)
> Blake, what is the status of this patch? Is it still good to review?

Yeah, the only question that needs answering before review is whether we want to do this at all.
Comment 74 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-12-12 16:24:58 PST
Assuming that this makes us follow the HTML5 algorithm I think we should try it.

The only reason not to do it would be if we think the HTML5 parser is going to land pretty soon anyway...
Comment 75 Matthias Versen [:Matti] 2009-01-01 09:44:57 PST
*** Bug 471787 has been marked as a duplicate of this bug. ***
Comment 76 Blake Kaplan (:mrbkap) 2009-03-18 23:53:08 PDT
*** Bug 484036 has been marked as a duplicate of this bug. ***
Comment 77 Blake Kaplan (:mrbkap) 2009-03-18 23:53:51 PDT
*** Bug 475800 has been marked as a duplicate of this bug. ***
Comment 78 pp 2009-03-19 02:01:10 PDT
I was the one opening bug 484036. This bug unfortunately didn't show up when I searched, I'm sorry for that.
Just to add to this discussion, putting the url of an IDN-domain between those comment tags triggers this error too. I run several forums based off vBulletin and this application puts the forum's url wrapped in a comment in the footer. One of my forums is using an IDN-domain and vBulletin puts that domain name in punycode format in the footer (www.xn--something-xyz.com) and breaks the page. Not allowing certain valid domains within a comment seems a little far fetched so I hope this problem will get resolved. Other browsers renders this correctly.
Comment 79 Boris Zbarsky [:bz] 2009-03-19 05:46:30 PDT
> Not allowing certain valid domains within a comment seems a little far fetched

Er, say what?  The two languages are unrelated!  You wouldn't complain if a url that contains "*/" ended a CSS comment, would you?

This bug should be fixed for compat, but said compat is just a workaround for people sticking things with the comment end delimiter in them inside comments...
Comment 80 pp 2009-03-19 07:29:31 PDT
Not exactly sure what you mean. I was referring to comment #3 where someone claims it's forbidden to have -- enclosed in comment tags. The people inventing punycode and making IDN domains a standard either didn't know this or completely ignored it. In any case it's the users of Firefox that has to pay the price which, after reading this thread, seems to have been forgotten. Standards seem to be more important for Mozilla than user experience. If I have misunderstood this I sincerely apologize.
Comment 81 Boris Zbarsky [:bz] 2009-03-19 07:54:35 PDT
> or completely ignored it

The latter.  Or assumed people would properly escape their stuff inside comments, of course.
Comment 82 Tony Mechelynck [:tonymec] 2009-03-26 14:45:18 PDT
In reply to comment #80: If you want to put */ inside a C comment without ending the comment, you have to alter it somehow. Add a space between the star and slash, maybe. Similarly, you can't put a punycode URL, containing two dashes in the middle of text, inside an HTML4 comment -- you have to alter it somehow. If you want the punycode URL to be human-readable, you can replace two dashes by four (but the opposite conversion will have to be done when copying it to the URL bar); or if you want it to be machine-readable, I think you can replace the two dashes by %2D%2D (correct me, someone, if I'm wrong in thinking that such an "escaped" URL, when copied to the URL bar, will be correctly interpreted, first by having each %2D replaced by a dash, then by punycode interpretation). _Four_ dashes in sequence are allowed within an HTML comment, even in HTML4; and %2D%2D means -- in URL syntax but not in HTML comment syntax.

IIUC, HTML5 comment syntax differs from HTML4 comment syntax; but I don't know the details. The above paragraph is about HTML4 but this bug is (IIUC, now that it has been reopened, assigned, and given the "html5" keyword) about altering Gecko to handle HTML5 comments correctly.
Comment 83 Adam Nielsen 2009-03-26 20:28:01 PDT
Your example is not quite correct though.  */ is the terminating marker for a C comment - nobody is wanting to put --> inside a HTML comment.  The problem is putting just part of the comment terminator inside a comment terminates it.  It would be like being unable to use a single * inside a C comment.

The same argument applies to comment #79.  I wouldn't complain if a URL containing */ ended a CSS comment, but I could complain if a URL containing * ended it.  Likewise I'm not complaining that "-->" ends a HTML comment, but rather that "--" does.

I'm not aware of any other language that uses one character sequence to start a comment, but two or more different sequences to terminate that same comment.
Comment 84 Mike Shaver (:shaver -- probably not reading bugmail closely) 2009-03-27 04:44:55 PDT
You can choose whether you're wrong according to HTML4.01 and SGML:

http://htmlhelp.com/reference/wilbur/misc/comment.html

or according to HTML5:

http://dev.w3.org/html5/spec/Overview.html#comments

but unfortunately that's just the way HTML comments work.  You can't put -- inside them.
Comment 85 Boris Zbarsky [:bz] 2009-03-27 05:49:17 PDT
Adam, in SGML and HTML the comment terminator sequence is "--", and can only be used inside an SGML markup declaration.  SGML markup declaration start with "<!" and end with ">".  Here's an example from the HTML4 DTD at http://www.w3.org/TR/REC-html40/sgml/dtd.html :

<!ATTLIST Q
  %attrs;                              -- %coreattrs, %i18n, %events --
  cite        %URI;          #IMPLIED  -- URI for source document or msg --
  >

Here the comments are "%coreattrs, %i18n, %events" and "URI for source document or msg" but the rest of the text is not comment and in fact is the declaration for the allowed attributes on the Q element.

I realize this is a bit more complex than the way comments work in C, but that's life.

Note that I already said all this in this bug in comment 4, almost 6 years ago...

In any case, this bug is now about ignoring HTML4 and implementing the HTML5 definition of comments, which does indeed start with "<!--" and end with "-->", so I'm really not sure what all the discussion is about at this point.
Comment 86 Murray Crowe 2009-04-02 00:47:19 PDT
I've noticed that -- followed by any sequence of characters and a > within a conditional comment really messes things up. Surely a conditional comment needs to be an exception to this rule, as it may contain a script with valid code. I realise conditional comments aren't best practice coding, but they do exist and they do get parsed by browsers.

Here's an example:

<!--[if lt IE 7]>
<script language="javascript">
var i=1;
i--;
var string=">";
alert("this displays as inline text in firefox");
</script>
<![endif]-->
Comment 87 Marius Hudea 2009-04-02 01:19:40 PDT
This is starting to get ridiculous... 

#86 ... perhaps you can use eval and unescape to avoid having -- and > characters in IE only code?
Comment 88 Boris Zbarsky [:bz] 2009-04-02 08:18:04 PDT
Conditional comments are "parsed by browsers" as just comments, with no special treatment, except for IE.  Seriously, if you want to shoot yourself in the foot with HTML you can.

And I still don't nderstand what the discussion is about, since the plan is to change behavior here...  Can people just shut up and let the bug be until it's fixed?
Comment 89 Daniel.S 2009-04-10 04:50:10 PDT
*** Bug 487773 has been marked as a duplicate of this bug. ***
Comment 90 Lars Gunther 2009-04-20 12:01:17 PDT
#86 Use Conditional compilation instead. Problem solved!
Comment 91 Kevin Brosnan 2009-06-23 21:05:17 PDT
*** Bug 500110 has been marked as a duplicate of this bug. ***
Comment 92 pbyhistorian 2009-06-24 13:01:25 PDT
I submitted duplicate bug 500110 for this, after three failed searches.

Boris' comment "... I already said all this ... almost six years ago" (#86) is distressing.  Six years!  By the time HTML5 is official *and* the browser manufacturers adopt it *and* older browsers like Firefox 3.0.11 disappear from the Internet, I may have retired.

I'm in the camp that uses dashes (and others) in comments to visually break my code into sections.  A solution that looks good and seems to work well is to replace the non-allowed dashes with character 196 (Alt-196 in Windows).
Comment 93 Phil Ringnalda (:philor) 2009-06-26 21:56:56 PDT
*** Bug 500887 has been marked as a duplicate of this bug. ***
Comment 94 Mario Rossi 2009-07-17 06:05:27 PDT
ã
Comment 95 Jo Hermans 2009-09-15 02:47:16 PDT
*** Bug 516666 has been marked as a duplicate of this bug. ***
Comment 96 Jim Michaels 2009-09-15 04:48:35 PDT
but what I am curious about is, are these valid HTML?  Am I to understand that a space is required?
<!-----> (one - in the middle. in firefox, this is not a usable comment because it has an odd number of -'s.)
<!--+--> (one + in the middle, should be same problem as above, but lexically analyzed differently.  I am not sure, but I think in firefox this may not be a usable comment.)
<!--->->--> (->-> in the middle.  in firefox this is likely not be a usable comment because it has 3 -'s.)
<!------> (two - in the middle. in firefox, this is a usable comment because it has an even number of -'s.)

What I have noticed in the past about the firefox lexer is that it just simply pairs off -'s, which may be the wrong way to lexically analyze them. 

I was thinking of the following algorithm:
char0=0
char1=0
char2=0
match "<!--" in sequence
char0=get character
while (char0=get character != EOF) {
    if (char0=='-') {
        char1=get character
        if (char1=='-') {
            char2=get character
            if (char2=='>') {
                //found end of comment!
                break;
            }
        }
    }
}

but then I realized that this is the logic error that only recognizes double -'s!
what I really need here is the software equivelant of the digital electronics shift register.  it is similar to a deque, where you push on one side and pop on the other.  If I could examine the contents of the whole deque, that would be wonderful.  then I could see if the --> was coming down the pipe, and I could keep it always 3 or 4 characters full of characters that I newly got.
got it?  

what you would need to implement this in C++ is something like the STL to do this the easiest way.  there is already a deque class.  unfortunately, I have already stuffed my STL book in a box for moving. an iterator should provide the necessary means of iterating across the data elements of the deque.  and they are easy to make.
Jim Michaels
Comment 97 Adam Nielsen 2009-09-15 05:05:18 PDT
Well the way I look at it is the tag name is "!--" in the same way as the tag name might be "img".  Obviously <imgblah> is not valid (you need a space after the tag name), so I would expect <!-----> to be something completely different to a comment.  <!-- - --> would be a comment with an uneven number of dashes.
Comment 98 Boris Zbarsky [:bz] 2009-09-15 05:21:17 PDT
Jim, please read comment 85 (and then comment 4).  Those explain exactly how SGML and HTML4 work here (and yes, they require simply pairing off "--").

Nte that a comment containing "--" is invalid HTML4 and it says so right in the HTML4 specification.

HTML5 changes the specified behavior here, which is why this bug is still (or rather again) open.
Comment 99 philippe (part-time) 2009-11-22 14:54:36 PST
*** Bug 530451 has been marked as a duplicate of this bug. ***
Comment 100 Daniel Veditz [:dveditz] 2009-11-23 09:03:29 PST
*** Bug 530476 has been marked as a duplicate of this bug. ***
Comment 101 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-12-11 16:56:46 PST
Comment on attachment 329653 [details] [diff] [review]
Implement HTML5 comments, v2.7

Hopefully all of this code is going away soon, so no need to muck around with it at this point.

If that plan changes please rerequest review
Comment 102 Reed Loden [:reed] (use needinfo?) 2009-12-25 22:41:34 PST
*** Bug 536769 has been marked as a duplicate of this bug. ***
Comment 103 Reed Loden [:reed] (use needinfo?) 2010-02-03 13:34:38 PST
*** Bug 544091 has been marked as a duplicate of this bug. ***
Comment 104 Robert Longson 2010-02-16 08:57:32 PST
*** Bug 546436 has been marked as a duplicate of this bug. ***
Comment 105 Joseph 2010-03-24 16:10:29 PDT
Not sure about implementation in swallowing, but I think you use the big state machines...  while in comments you could break this into three states:

IN_COMMENTS:
  if ( nextchar == "-" )
  {
     state = IN_COMMENTS_ONEDASH;
     break;
  }
  // else swallow the rest;
IN_COMMENTS_ONEDASH:
  if ( nextchar == "-" )
  {
     state = IN_COMMENTS_TWODASH;
     break;
  }
  else
  {
     state = IN_COMMENTS;
     break
  }
IN_COMMENTS_TWODASH:
  if ( nextchar == ">" )
  {
     state = DATA; // or return to whatever state we were in
     break;
  }
  else if ( nextchar == "-" )
  {
    state = IN_COMMENTS_TWODASH; // cause now we have two dashes again
  }
  else
  {
    state = IN_COMMENTS;
  }
  // maybe some error here because of dashes and no end to comments


That should work for the state engines and you can keep track of the errors.
Comment 106 Joseph 2010-03-24 17:11:33 PDT
(In reply to comment #105)

http://www.w3.org/TR/2010/WD-html5-20100304/syntax.html#comments (just counting "<!--" for comment block)

well, in https://hg.mozilla.org/mozilla-central/file/e9312d05488f/parser/html/javasrc/Tokenizer.java it's already this way, except for the allowing space after the two dashes (not sure if that is whats being used).

it breaks from comments in case of <!--> or <!---> which seems fine, but it breaks when <!-- -- >... which in html5 is incorrect if the comment block should end in "-->".

If white-space (or "!") is encountered such as in COMMENT_END it should do what it does in "default" and not go to COMMENT_END_SPACE (or BANG) like it does.  Other than that and other tokenizers, it appears it should work.
Comment 107 Jo Hermans 2010-04-30 13:45:19 PDT
*** Bug 562921 has been marked as a duplicate of this bug. ***
Comment 108 James 2010-05-05 13:27:00 PDT
I thought <!-- and --> delimited comments. At least that is how it works with ie. (Yes I understand that ie doesn't follow all standards).
Comment 109 Boris Zbarsky [:bz] 2010-05-05 13:34:28 PDT
James, you thought wrong, in the case of HTML4.  See comment 4.

Note that as a result of IE's behavior and people thinking it's correct the standard got changed...
Comment 110 Jonas Sicking (:sicking) No longer reading bugmail consistently 2010-05-05 14:13:21 PDT
Also note that this bug has been marked FIXED, which means that this bug is fixed in the code that will become the next version of firefox.
Comment 111 Jo Hermans 2010-05-24 07:13:38 PDT
*** Bug 567745 has been marked as a duplicate of this bug. ***
Comment 112 Phil Ringnalda (:philor) 2010-06-25 00:05:59 PDT
*** Bug 574594 has been marked as a duplicate of this bug. ***
Comment 113 Jo Hermans 2010-07-05 13:47:52 PDT
*** Bug 576990 has been marked as a duplicate of this bug. ***
Comment 114 Cork 2010-07-10 14:02:09 PDT
*** Bug 577858 has been marked as a duplicate of this bug. ***
Comment 115 Mardeg 2010-07-10 14:30:49 PDT
*** Bug 477200 has been marked as a duplicate of this bug. ***
Comment 116 Matthias Versen [:Matti] 2010-07-15 09:04:55 PDT
*** Bug 578947 has been marked as a duplicate of this bug. ***
Comment 117 Jo Hermans 2010-07-17 02:01:57 PDT
*** Bug 579613 has been marked as a duplicate of this bug. ***
Comment 118 Dave Garrett 2010-08-21 07:22:41 PDT
*** Bug 589433 has been marked as a duplicate of this bug. ***
Comment 119 Cork 2010-09-13 22:21:39 PDT
*** Bug 596142 has been marked as a duplicate of this bug. ***
Comment 120 Henri Sivonen (:hsivonen) 2010-09-25 09:24:07 PDT
*** Bug 599619 has been marked as a duplicate of this bug. ***
Comment 121 Boris Zbarsky [:bz] 2010-09-28 12:59:23 PDT
*** Bug 600298 has been marked as a duplicate of this bug. ***
Comment 122 Mardeg 2010-09-29 04:42:40 PDT
*** Bug 600508 has been marked as a duplicate of this bug. ***
Comment 123 Robert Longson 2010-11-04 03:28:00 PDT
*** Bug 609542 has been marked as a duplicate of this bug. ***
Comment 124 Robert Longson 2010-11-24 03:56:57 PST
*** Bug 614518 has been marked as a duplicate of this bug. ***
Comment 125 Cork 2010-11-30 02:20:38 PST
*** Bug 615498 has been marked as a duplicate of this bug. ***
Comment 126 Robert Longson 2010-12-23 13:07:23 PST
*** Bug 621209 has been marked as a duplicate of this bug. ***
Comment 127 Daniel Veditz [:dveditz] 2011-02-16 22:13:48 PST
*** Bug 634536 has been marked as a duplicate of this bug. ***
Comment 128 Daniel Veditz [:dveditz] 2011-02-16 22:18:35 PST
Given Firefox 4 about to ship with an html5-compliant parser that may lead to even more sites "broken" in Firefox 3.6, which will have a significant number of users for quite some time. It may be worth taking Blake's patch or something like it on the 1.9.2. branch.
Comment 129 Kevin Brosnan [:kbrosnan] 2011-02-18 06:32:12 PST
*** Bug 635220 has been marked as a duplicate of this bug. ***
Comment 130 Mardeg 2011-03-03 00:53:58 PST
*** Bug 638387 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.