Bug 214476 (SGMLComment)

Mozilla interprets a -- (two dashes in a row) inside of a comment or an include improperly




16 years ago
8 years ago


(Reporter: bugzilla-emails-bugs, Unassigned)




Firefox Tracking Flags

(Not tracked)


(Whiteboard: [fixed by the HTML5 parser], )


(1 attachment, 3 obsolete attachments)



16 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5a) Gecko/20030626 Mozilla Firebird/0.6
Build Identifier: 

While updating our top level page, we found out that if a double hyphen (--) is
used within a ssi style comment or ssi that it is misinterpreted by mozilla and
the close may not be read properly, leaving parts or all of the page not displayed.
<!-- V. 3: August 2000 by University Relations; designer--developer: Bennet
George -->
The double dash between designer and developer caused the comment to not close
For the time being we found that we can avoid it simply by using a single
hyphen. The problem does not occur in opera, msie, or most of the other browsers
we tested with and does comply to XML standards.

I do not remember if we had spaces on either side of the hyphens or not.

Reproducible: Always

Steps to Reproduce:

Actual Results:  
Browser displayed only a small portion of the page

Expected Results:  
Browser should have interpreted the --> as the end of the comment, it did not

Comment 2

16 years ago
*** Bug 214475 has been marked as a duplicate of this bug. ***

Comment 3

16 years ago
This is invalid.  No -- are permitted within a comment.
Closed: 16 years ago
Resolution: --- → INVALID
In SGML, "--" starts a comment and "--" ends a comment.  HTML just uses SGML
comments, with "<!" signalling the starts of SGML markup and ">" signalling its
end.  Therefore:

<!-- -- --> text -->

Has two comments; one containing the string " " and one containing the string 
"> text ".

Now if Mozilla is put in quirks mode, we do backwards compatible comment parsing
(read "broken comment parsing just like old browsers").  But it sounds like the
site in question put Mozilla in standards mode.
Bill, not the difference between what you said and how comment parsing actually
works (it's subtle, but important: "--" is the comment delimiter, not just "not
allowed inside a comment").

Comment 6

16 years ago
BZ, maybe before you CC me yet again on bugs that I don't want to be on, you
should take time to take task within the W3C HTML group:

"A common error is to include a string of hyphens ("---") within a comment.
Authors should avoid putting two or more adjacent hyphens inside comments."

Stop CCing me to be pedantic when I'm quoting a spec, or I swear I'll just stop
donating time to Mozilla to triage bugs.  I didn't start writing HTML yesterday.

Comment 7

15 years ago
*** Bug 269104 has been marked as a duplicate of this bug. ***
*** Bug 271854 has been marked as a duplicate of this bug. ***

Comment 9

15 years ago
*** Bug 271860 has been marked as a duplicate of this bug. ***
*** Bug 288610 has been marked as a duplicate of this bug. ***

Comment 11

14 years ago
*** Bug 294614 has been marked as a duplicate of this bug. ***


14 years ago
Alias: SGMLComment
*** Bug 294796 has been marked as a duplicate of this bug. ***

Comment 13

14 years ago
*** Bug 307747 has been marked as a duplicate of this bug. ***
*** Bug 318411 has been marked as a duplicate of this bug. ***

Comment 15

14 years ago
*** Bug 320933 has been marked as a duplicate of this bug. ***

Comment 16

14 years ago
*** Bug 321472 has been marked as a duplicate of this bug. ***

Comment 17

13 years ago
*** Bug 332516 has been marked as a duplicate of this bug. ***

Comment 18

13 years ago
*** Bug 338810 has been marked as a duplicate of this bug. ***

Comment 19

13 years ago
*** Bug 340975 has been marked as a duplicate of this bug. ***

Comment 20

13 years ago
*** Bug 341443 has been marked as a duplicate of this bug. ***

Comment 21

13 years ago
Invalid or not, I'm not sure I understand why Firefox sometimes will render these rogue comments, sometimes will treat them as commenting out entire sections, etc.  Regardless of the number of "-" between the start and end of a comment, shouldn't they always not render and not affect anything else?  With the number of duplicates, obviously lots of people use this technique to visually block off sections of code.
> I'm not sure I understand

Please read the whole bug, esp. comment 4.

Comment 23

13 years ago
The following perfectly valid web page is made invalid by having two hyphen characters in a row inside an HTML comment.

Enter the following web page code into the HTML validator under "Validate by Direct Input" at the following address:


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<title>HTML Comments Display on Web pages</title>


This entire comment -- will show in web browser


<p>This line is supposed to be first visible text on web page.</p>

<p>The page works perfectly if the hyphen is split with even a space or removed completely.</p>


*** Bug 351419 has been marked as a duplicate of this bug. ***
*** Bug 352664 has been marked as a duplicate of this bug. ***

Comment 26

13 years ago
So if I'm a web developer and I need to allow -- inside comments, what am I to do? FFox renders this directly even with transitional mode, which strikes me as the wrong thing. Or is there some browser-specific way to force "quirks mode" ?
> So if I'm a web developer and I need to allow -- inside comments

You can't do that in HTML, if the HTML spec is actually followed.

> Or is there some browser-specific way to force "quirks mode" ?

This is well-documented at http://developer.mozilla.org/en/docs/Mozilla%27s_DOCTYPE_sniffing

I should note that that page is linked directly off http://developer.mozilla.org/en/docs/Mozilla's_Quirks_Mode, which is the first Google hit for "mozilla quirks".

Comment 28

13 years ago
*** Bug 359014 has been marked as a duplicate of this bug. ***

Comment 29

13 years ago
*** Bug 362663 has been marked as a duplicate of this bug. ***
Duplicate of this bug: 365844


13 years ago
Duplicate of this bug: 367371
Duplicate of this bug: 368539
Duplicate of this bug: 375803


12 years ago
Duplicate of this bug: 387165
Duplicate of this bug: 387941


12 years ago
Duplicate of this bug: 388137
Duplicate of this bug: 393426
Duplicate of this bug: 396793
Duplicate of this bug: 404691


12 years ago
Duplicate of this bug: 406694
Duplicate of this bug: 412761
Duplicate of this bug: 423893
Duplicate of this bug: 428286
Duplicate of this bug: 429914

Comment 45

11 years ago
FWIW, per HTML5 this is a bug in Firefox.
But -- is still not allowed inside a comment, right? Just the error handling is different (http://www.whatwg.org/specs/web-apps/current-work/#bogus)?

Comment 47

11 years ago
Right. (Though you would not end up in the "bogus comment state".) And also, that doesn't make it less of a bug :-)
Reopening per Anne's comment.
Severity: major → normal
Resolution: INVALID → ---
Assignee: harishd → nobody
Ever confirmed: true
QA Contact: dsirnapalli → parser
Duplicate of this bug: 432082


11 years ago
Duplicate of this bug: 436298

Comment 51

11 years ago
If this is because -- is not allowed between <!-- and -->, then it sounds like the specification is inadequate. It defies common sense to use something like that for delimiting when you know full well that it's also used for comments and that -- is commonly used in texts. Use a less common sequence of characters for that purpose, ergo <!-- -#- -->. Don't make the job of a web developer more a pain because of ridiculously short-sighted standards.

Comment 52

11 years ago
SGML comments are ridiculous and problematic. They confuse authors and break pages. HTML 5 recognises this, and no longer requires browsers to parse comments in SGML format. They are now parsed in a way compatible with all browsers except Firefox. Firefox seems to be insisting on hanging on to the SGML comments, even though others realised they are stupid, and protested against their inclusion in Acid 2.

HTML is not SGML, and never has been (even though it was originally supposed to be) - this won't work anywhere, even though it is valid SGML:

SGML comments were removed from Acid 2 because they are stupid. They were removed from HTML 5 because they are stupid. It's time to remove them from Firefox, and stop breaking pages like the one in this report.

For those wondering why some patterns work, and others leave bits on the page:

Comment 53

11 years ago
If Tarquin is right and SGML comments were removed from HTML5, then Firefox should not treat documents with a valid HTML5 Doctype and SGML comment deliminators as such. For older documents, or non-valid HTML5 documents, the gotcha-but-correct behaviour should be maintained.
Flags: blocking1.9.1?

Comment 54

11 years ago
Note that this is specified in the parsing section of HTML 5 (the language definition tells authors not to include -- inside comments, but the error handling stage of parsing will allow it):

Parse error. Append a U+002D HYPHEN-MINUS (-) character to the comment token's data. Stay in the comment end state.
Anything else 
Parse error. Append two U+002D HYPHEN-MINUS (-) characters and the input character to the comment token's data. Switch to the comment state."
Wouldn't hold the release for this, but I think we should get this on the list for 1.9.1. 

wanted1.9.1+, P2.  

Anyone want to volunteer here?
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Priority: -- → P2
I'll take this.
Assignee: nobody → mrbkap

Comment 57

11 years ago
"For older documents, or non-valid HTML5 documents, the gotcha-but-correct behaviour should be maintained."

This will not help anyone. The broken pages (including the one in this report) will not get fixed. Firefox will remain incompatible with all other browsers. SGML comments were removed from HTML because they are stupid in all cases, not sometimes-stupid. Having comments that nobody understands in HTML 4 standards mode, but not in quirks mode or HTML 5 mode, is beyond confusing.

They should be removed in all modes in order to be compatible with existing Web Pages, other browsers, and author expectations, while providing a consistent response to all doctypes.
Posted patch Implement HTML5 comments, v1 (obsolete) — Splinter Review
If we're going to replace our comment parsing, we might as well implement HTML5. This patch is a straightforward implementation of the part of the state machine that consumes comments. I haven't tested it very thoroughly (in particular, I need to ensure that the behavior is consistent across packet boundaries) but as far as I can tell, it follows the spec word for word.
Blake, what test framework do we use to test something like this?
We have parser mochitests.
So, the parser mochitests work, but require a bunch of manual verification. In particular, my patch doesn't affect where we put the comments in the DOM and all of the interesting test cases in our mochitests hit this problem.
Doesn't your patch affect where the comment terminates and therefore what Element nodes end up in the document?
Sorry, yes. I meant that given the testcase |<!-- comment -->|, our resulting DOM looks like:
    <!-- comment -->

where the tests want
<!-- comment -->

and fixing that seems beyond the scope of this bug (unless people say otherwise).
Sure.  I was assuming we'd add the tests from this bug and/or duplicates to parser/htmlparser/tests/mochitest/regressions.txt or some such.  At least that's where I've been adding the parser tests... ;)
Posted patch Implement HTML5 comments, v2 (obsolete) — Splinter Review
I made the state machine a little less jumpy and started adding tests. Unfortunately, the tests I've added here all fail. I don't understand how we're serializing these comments.
Attachment #326714 - Attachment is obsolete: true
This adds a bunch of tests, and I'm pretty sure that I've implemented the spec faithfully. This is ready for review.
Attachment #327975 - Attachment is obsolete: true
Attachment #328128 - Flags: review?(jonas)
One thing I've noticed is in the testcase: |<title>foo <!-- bar| the resulting content model is:

      "foo "
    <!--  bar -->

I'll fix that.
Here's the interesting part of the interdiff:

diff --git a/parser/htmlparser/src/nsHTMLTokens.cpp b/parser/htmlparser/src/nsHTMLTokens.cpp
--- a/parser/htmlparser/src/nsHTMLTokens.cpp
+++ b/parser/htmlparser/src/nsHTMLTokens.cpp
@@ -900,6 +900,10 @@ CTextToken::ConsumeParsedCharacterData(P
         mNewlineCount += consumer.GetNewlineCount();
+        // If we successfully consumed a comment, end the title after the
+        // comment.
+        aScanner.CurrentPosition(altEndPos);
Attachment #328128 - Attachment is obsolete: true
Attachment #329653 - Flags: review?(jonas)
Attachment #328128 - Flags: review?(jonas)
Duplicate of this bug: 451190


11 years ago
Duplicate of this bug: 455664
Duplicate of this bug: 464452
Blake, what is the status of this patch? Is it still good to review?
(In reply to comment #72)
> Blake, what is the status of this patch? Is it still good to review?

Yeah, the only question that needs answering before review is whether we want to do this at all.
Assuming that this makes us follow the HTML5 algorithm I think we should try it.

The only reason not to do it would be if we think the HTML5 parser is going to land pretty soon anyway...
Duplicate of this bug: 471787
Duplicate of this bug: 484036
Duplicate of this bug: 475800

Comment 78

10 years ago
I was the one opening bug 484036. This bug unfortunately didn't show up when I searched, I'm sorry for that.
Just to add to this discussion, putting the url of an IDN-domain between those comment tags triggers this error too. I run several forums based off vBulletin and this application puts the forum's url wrapped in a comment in the footer. One of my forums is using an IDN-domain and vBulletin puts that domain name in punycode format in the footer (www.xn--something-xyz.com) and breaks the page. Not allowing certain valid domains within a comment seems a little far fetched so I hope this problem will get resolved. Other browsers renders this correctly.
> Not allowing certain valid domains within a comment seems a little far fetched

Er, say what?  The two languages are unrelated!  You wouldn't complain if a url that contains "*/" ended a CSS comment, would you?

This bug should be fixed for compat, but said compat is just a workaround for people sticking things with the comment end delimiter in them inside comments...

Comment 80

10 years ago
Not exactly sure what you mean. I was referring to comment #3 where someone claims it's forbidden to have -- enclosed in comment tags. The people inventing punycode and making IDN domains a standard either didn't know this or completely ignored it. In any case it's the users of Firefox that has to pay the price which, after reading this thread, seems to have been forgotten. Standards seem to be more important for Mozilla than user experience. If I have misunderstood this I sincerely apologize.
> or completely ignored it

The latter.  Or assumed people would properly escape their stuff inside comments, of course.
In reply to comment #80: If you want to put */ inside a C comment without ending the comment, you have to alter it somehow. Add a space between the star and slash, maybe. Similarly, you can't put a punycode URL, containing two dashes in the middle of text, inside an HTML4 comment -- you have to alter it somehow. If you want the punycode URL to be human-readable, you can replace two dashes by four (but the opposite conversion will have to be done when copying it to the URL bar); or if you want it to be machine-readable, I think you can replace the two dashes by %2D%2D (correct me, someone, if I'm wrong in thinking that such an "escaped" URL, when copied to the URL bar, will be correctly interpreted, first by having each %2D replaced by a dash, then by punycode interpretation). _Four_ dashes in sequence are allowed within an HTML comment, even in HTML4; and %2D%2D means -- in URL syntax but not in HTML comment syntax.

IIUC, HTML5 comment syntax differs from HTML4 comment syntax; but I don't know the details. The above paragraph is about HTML4 but this bug is (IIUC, now that it has been reopened, assigned, and given the "html5" keyword) about altering Gecko to handle HTML5 comments correctly.

Comment 83

10 years ago
Your example is not quite correct though.  */ is the terminating marker for a C comment - nobody is wanting to put --> inside a HTML comment.  The problem is putting just part of the comment terminator inside a comment terminates it.  It would be like being unable to use a single * inside a C comment.

The same argument applies to comment #79.  I wouldn't complain if a URL containing */ ended a CSS comment, but I could complain if a URL containing * ended it.  Likewise I'm not complaining that "-->" ends a HTML comment, but rather that "--" does.

I'm not aware of any other language that uses one character sequence to start a comment, but two or more different sequences to terminate that same comment.
You can choose whether you're wrong according to HTML4.01 and SGML:


or according to HTML5:


but unfortunately that's just the way HTML comments work.  You can't put -- inside them.
Adam, in SGML and HTML the comment terminator sequence is "--", and can only be used inside an SGML markup declaration.  SGML markup declaration start with "<!" and end with ">".  Here's an example from the HTML4 DTD at http://www.w3.org/TR/REC-html40/sgml/dtd.html :

  %attrs;                              -- %coreattrs, %i18n, %events --
  cite        %URI;          #IMPLIED  -- URI for source document or msg --

Here the comments are "%coreattrs, %i18n, %events" and "URI for source document or msg" but the rest of the text is not comment and in fact is the declaration for the allowed attributes on the Q element.

I realize this is a bit more complex than the way comments work in C, but that's life.

Note that I already said all this in this bug in comment 4, almost 6 years ago...

In any case, this bug is now about ignoring HTML4 and implementing the HTML5 definition of comments, which does indeed start with "<!--" and end with "-->", so I'm really not sure what all the discussion is about at this point.

Comment 86

10 years ago
I've noticed that -- followed by any sequence of characters and a > within a conditional comment really messes things up. Surely a conditional comment needs to be an exception to this rule, as it may contain a script with valid code. I realise conditional comments aren't best practice coding, but they do exist and they do get parsed by browsers.

Here's an example:

<!--[if lt IE 7]>
<script language="javascript">
var i=1;
var string=">";
alert("this displays as inline text in firefox");

Comment 87

10 years ago
This is starting to get ridiculous... 

#86 ... perhaps you can use eval and unescape to avoid having -- and > characters in IE only code?
Conditional comments are "parsed by browsers" as just comments, with no special treatment, except for IE.  Seriously, if you want to shoot yourself in the foot with HTML you can.

And I still don't nderstand what the discussion is about, since the plan is to change behavior here...  Can people just shut up and let the bug be until it's fixed?


10 years ago
Duplicate of this bug: 487773

Comment 90

10 years ago
#86 Use Conditional compilation instead. Problem solved!


10 years ago
Duplicate of this bug: 500110

Comment 92

10 years ago
I submitted duplicate bug 500110 for this, after three failed searches.

Boris' comment "... I already said all this ... almost six years ago" (#86) is distressing.  Six years!  By the time HTML5 is official *and* the browser manufacturers adopt it *and* older browsers like Firefox 3.0.11 disappear from the Internet, I may have retired.

I'm in the camp that uses dashes (and others) in comments to visually break my code into sections.  A solution that looks good and seems to work well is to replace the non-allowed dashes with character 196 (Alt-196 in Windows).
Duplicate of this bug: 500887

Comment 94

10 years ago


10 years ago
Duplicate of this bug: 516666

Comment 96

10 years ago
but what I am curious about is, are these valid HTML?  Am I to understand that a space is required?
<!-----> (one - in the middle. in firefox, this is not a usable comment because it has an odd number of -'s.)
<!--+--> (one + in the middle, should be same problem as above, but lexically analyzed differently.  I am not sure, but I think in firefox this may not be a usable comment.)
<!--->->--> (->-> in the middle.  in firefox this is likely not be a usable comment because it has 3 -'s.)
<!------> (two - in the middle. in firefox, this is a usable comment because it has an even number of -'s.)

What I have noticed in the past about the firefox lexer is that it just simply pairs off -'s, which may be the wrong way to lexically analyze them. 

I was thinking of the following algorithm:
match "<!--" in sequence
char0=get character
while (char0=get character != EOF) {
    if (char0=='-') {
        char1=get character
        if (char1=='-') {
            char2=get character
            if (char2=='>') {
                //found end of comment!

but then I realized that this is the logic error that only recognizes double -'s!
what I really need here is the software equivelant of the digital electronics shift register.  it is similar to a deque, where you push on one side and pop on the other.  If I could examine the contents of the whole deque, that would be wonderful.  then I could see if the --> was coming down the pipe, and I could keep it always 3 or 4 characters full of characters that I newly got.
got it?  

what you would need to implement this in C++ is something like the STL to do this the easiest way.  there is already a deque class.  unfortunately, I have already stuffed my STL book in a box for moving. an iterator should provide the necessary means of iterating across the data elements of the deque.  and they are easy to make.
Jim Michaels

Comment 97

10 years ago
Well the way I look at it is the tag name is "!--" in the same way as the tag name might be "img".  Obviously <imgblah> is not valid (you need a space after the tag name), so I would expect <!-----> to be something completely different to a comment.  <!-- - --> would be a comment with an uneven number of dashes.
Jim, please read comment 85 (and then comment 4).  Those explain exactly how SGML and HTML4 work here (and yes, they require simply pairing off "--").

Nte that a comment containing "--" is invalid HTML4 and it says so right in the HTML4 specification.

HTML5 changes the specified behavior here, which is why this bug is still (or rather again) open.


10 years ago
Duplicate of this bug: 530451
Duplicate of this bug: 530476
Comment on attachment 329653 [details] [diff] [review]
Implement HTML5 comments, v2.7

Hopefully all of this code is going away soon, so no need to muck around with it at this point.

If that plan changes please rerequest review
Duplicate of this bug: 536769


10 years ago
Depends on: html5-parsing
Duplicate of this bug: 544091
Duplicate of this bug: 546436

Comment 105

9 years ago
Not sure about implementation in swallowing, but I think you use the big state machines...  while in comments you could break this into three states:

  if ( nextchar == "-" )
  // else swallow the rest;
  if ( nextchar == "-" )
     state = IN_COMMENTS;
  if ( nextchar == ">" )
     state = DATA; // or return to whatever state we were in
  else if ( nextchar == "-" )
    state = IN_COMMENTS_TWODASH; // cause now we have two dashes again
    state = IN_COMMENTS;
  // maybe some error here because of dashes and no end to comments

That should work for the state engines and you can keep track of the errors.

Comment 106

9 years ago
(In reply to comment #105)

http://www.w3.org/TR/2010/WD-html5-20100304/syntax.html#comments (just counting "<!--" for comment block)

well, in https://hg.mozilla.org/mozilla-central/file/e9312d05488f/parser/html/javasrc/Tokenizer.java it's already this way, except for the allowing space after the two dashes (not sure if that is whats being used).

it breaks from comments in case of <!--> or <!---> which seems fine, but it breaks when <!-- -- >... which in html5 is incorrect if the comment block should end in "-->".

If white-space (or "!") is encountered such as in COMMENT_END it should do what it does in "default" and not go to COMMENT_END_SPACE (or BANG) like it does.  Other than that and other tokenizers, it appears it should work.
Assignee: mrbkap → nobody


9 years ago
Duplicate of this bug: 562921
Closed: 16 years ago9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by the HTML5 parser]

Comment 108

9 years ago
I thought <!-- and --> delimited comments. At least that is how it works with ie. (Yes I understand that ie doesn't follow all standards).
James, you thought wrong, in the case of HTML4.  See comment 4.

Note that as a result of IE's behavior and people thinking it's correct the standard got changed...
Also note that this bug has been marked FIXED, which means that this bug is fixed in the code that will become the next version of firefox.


9 years ago
Duplicate of this bug: 567745
Duplicate of this bug: 574594


9 years ago
Duplicate of this bug: 576990


9 years ago
Duplicate of this bug: 577858


9 years ago
Duplicate of this bug: 477200
Duplicate of this bug: 578947


9 years ago
Duplicate of this bug: 579613


9 years ago
Duplicate of this bug: 589433


9 years ago
Duplicate of this bug: 596142
Duplicate of this bug: 599619
Duplicate of this bug: 600298


9 years ago
Duplicate of this bug: 600508
Duplicate of this bug: 609542
Duplicate of this bug: 614518


9 years ago
Duplicate of this bug: 615498
Duplicate of this bug: 621209
Duplicate of this bug: 634536
Given Firefox 4 about to ship with an html5-compliant parser that may lead to even more sites "broken" in Firefox 3.6, which will have a significant number of users for quite some time. It may be worth taking Blake's patch or something like it on the 1.9.2. branch.
Duplicate of this bug: 635220


8 years ago
Duplicate of this bug: 638387
You need to log in before you can comment on or make changes to this bug.