Closed
Bug 214476
(SGMLComment)
Opened 21 years ago
Closed 15 years ago
Mozilla interprets a -- (two dashes in a row) inside of a comment or an include improperly
Categories
(Core :: DOM: HTML Parser, defect, P2)
Core
DOM: HTML Parser
Tracking
()
RESOLVED
FIXED
People
(Reporter: bugzilla-emails-bugs, Unassigned)
References
()
Details
(Keywords: html5, Whiteboard: [fixed by the HTML5 parser])
Attachments
(1 file, 3 obsolete files)
21.52 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5a) Gecko/20030626 Mozilla Firebird/0.6 Build Identifier: While updating our top level page, we found out that if a double hyphen (--) is used within a ssi style comment or ssi that it is misinterpreted by mozilla and the close may not be read properly, leaving parts or all of the page not displayed. Example: <!-- V. 3: August 2000 by University Relations; designer--developer: Bennet George --> The double dash between designer and developer caused the comment to not close properly. For the time being we found that we can avoid it simply by using a single hyphen. The problem does not occur in opera, msie, or most of the other browsers we tested with and does comply to XML standards. I do not remember if we had spaces on either side of the hyphens or not. Reproducible: Always Steps to Reproduce: 1. 2. 3. Actual Results: Browser displayed only a small portion of the page Expected Results: Browser should have interpreted the --> as the end of the comment, it did not
Comment 1•21 years ago
|
||
See http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.4 I guess this bug is INVALID.
Comment 3•21 years ago
|
||
This is invalid. No -- are permitted within a comment.
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → INVALID
Comment 4•21 years ago
|
||
In SGML, "--" starts a comment and "--" ends a comment. HTML just uses SGML comments, with "<!" signalling the starts of SGML markup and ">" signalling its end. Therefore: <!-- -- --> text --> Has two comments; one containing the string " " and one containing the string "> text ". Now if Mozilla is put in quirks mode, we do backwards compatible comment parsing (read "broken comment parsing just like old browsers"). But it sounds like the site in question put Mozilla in standards mode.
Comment 5•21 years ago
|
||
Bill, not the difference between what you said and how comment parsing actually works (it's subtle, but important: "--" is the comment delimiter, not just "not allowed inside a comment").
Comment 6•21 years ago
|
||
BZ, maybe before you CC me yet again on bugs that I don't want to be on, you should take time to take task within the W3C HTML group: "A common error is to include a string of hyphens ("---") within a comment. Authors should avoid putting two or more adjacent hyphens inside comments." Stop CCing me to be pedantic when I'm quoting a spec, or I swear I'll just stop donating time to Mozilla to triage bugs. I didn't start writing HTML yesterday.
Status: RESOLVED → VERIFIED
Updated•20 years ago
|
Alias: SGMLComment
Comment 21•19 years ago
|
||
Invalid or not, I'm not sure I understand why Firefox sometimes will render these rogue comments, sometimes will treat them as commenting out entire sections, etc. Regardless of the number of "-" between the start and end of a comment, shouldn't they always not render and not affect anything else? With the number of duplicates, obviously lots of people use this technique to visually block off sections of code.
Comment 23•18 years ago
|
||
The following perfectly valid web page is made invalid by having two hyphen characters in a row inside an HTML comment. Enter the following web page code into the HTML validator under "Validate by Direct Input" at the following address: http://validator.w3.org/ <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>HTML Comments Display on Web pages</title> </head> <body> <!-- This entire comment -- will show in web browser --> <p>This line is supposed to be first visible text on web page.</p> <p>The page works perfectly if the hyphen is split with even a space or removed completely.</p> </body> </html>
Comment 26•18 years ago
|
||
So if I'm a web developer and I need to allow -- inside comments, what am I to do? FFox renders this directly even with transitional mode, which strikes me as the wrong thing. Or is there some browser-specific way to force "quirks mode" ?
Comment 27•18 years ago
|
||
> So if I'm a web developer and I need to allow -- inside comments You can't do that in HTML, if the HTML spec is actually followed. > Or is there some browser-specific way to force "quirks mode" ? This is well-documented at http://developer.mozilla.org/en/docs/Mozilla%27s_DOCTYPE_sniffing I should note that that page is linked directly off http://developer.mozilla.org/en/docs/Mozilla's_Quirks_Mode, which is the first Google hit for "mozilla quirks".
Comment 46•17 years ago
|
||
But -- is still not allowed inside a comment, right? Just the error handling is different (http://www.whatwg.org/specs/web-apps/current-work/#bogus)?
Comment 47•17 years ago
|
||
Right. (Though you would not end up in the "bogus comment state".) And also, that doesn't make it less of a bug :-)
Comment 48•17 years ago
|
||
Reopening per Anne's comment.
Severity: major → normal
Status: VERIFIED → UNCONFIRMED
Resolution: INVALID → ---
Updated•17 years ago
|
Assignee: harishd → nobody
Status: UNCONFIRMED → NEW
Ever confirmed: true
QA Contact: dsirnapalli → parser
Updated•17 years ago
|
Comment 51•17 years ago
|
||
If this is because -- is not allowed between <!-- and -->, then it sounds like the specification is inadequate. It defies common sense to use something like that for delimiting when you know full well that it's also used for comments and that -- is commonly used in texts. Use a less common sequence of characters for that purpose, ergo <!-- -#- -->. Don't make the job of a web developer more a pain because of ridiculously short-sighted standards.
Comment 52•16 years ago
|
||
SGML comments are ridiculous and problematic. They confuse authors and break pages. HTML 5 recognises this, and no longer requires browsers to parse comments in SGML format. They are now parsed in a way compatible with all browsers except Firefox. Firefox seems to be insisting on hanging on to the SGML comments, even though others realised they are stupid, and protested against their inclusion in Acid 2. HTML is not SGML, and never has been (even though it was originally supposed to be) - this won't work anywhere, even though it is valid SGML: http://virtuelvis.com/download/162/evilml.html SGML comments were removed from Acid 2 because they are stupid. They were removed from HTML 5 because they are stupid. It's time to remove them from Firefox, and stop breaking pages like the one in this report. For those wondering why some patterns work, and others leave bits on the page: http://www.howtocreate.co.uk/SGMLComments.html
Comment 53•16 years ago
|
||
If Tarquin is right and SGML comments were removed from HTML5, then Firefox should not treat documents with a valid HTML5 Doctype and SGML comment deliminators as such. For older documents, or non-valid HTML5 documents, the gotcha-but-correct behaviour should be maintained.
Updated•16 years ago
|
Flags: blocking1.9.1?
Comment 54•16 years ago
|
||
Note that this is specified in the parsing section of HTML 5 (the language definition tells authors not to include -- inside comments, but the error handling stage of parsing will allow it): http://www.w3.org/html/wg/html5/#comment3 "U+002D HYPHEN-MINUS (-) Parse error. Append a U+002D HYPHEN-MINUS (-) character to the comment token's data. Stay in the comment end state. ... Anything else Parse error. Append two U+002D HYPHEN-MINUS (-) characters and the input character to the comment token's data. Switch to the comment state."
Comment 55•16 years ago
|
||
Wouldn't hold the release for this, but I think we should get this on the list for 1.9.1. blocking1.9.1- wanted1.9.1+, P2. Anyone want to volunteer here?
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Priority: -- → P2
Comment 57•16 years ago
|
||
"For older documents, or non-valid HTML5 documents, the gotcha-but-correct behaviour should be maintained." This will not help anyone. The broken pages (including the one in this report) will not get fixed. Firefox will remain incompatible with all other browsers. SGML comments were removed from HTML because they are stupid in all cases, not sometimes-stupid. Having comments that nobody understands in HTML 4 standards mode, but not in quirks mode or HTML 5 mode, is beyond confusing. They should be removed in all modes in order to be compatible with existing Web Pages, other browsers, and author expectations, while providing a consistent response to all doctypes.
Updated•16 years ago
|
Status: NEW → ASSIGNED
Comment 58•16 years ago
|
||
If we're going to replace our comment parsing, we might as well implement HTML5. This patch is a straightforward implementation of the part of the state machine that consumes comments. I haven't tested it very thoroughly (in particular, I need to ensure that the behavior is consistent across packet boundaries) but as far as I can tell, it follows the spec word for word.
Comment 61•16 years ago
|
||
So, the parser mochitests work, but require a bunch of manual verification. In particular, my patch doesn't affect where we put the comments in the DOM and all of the interesting test cases in our mochitests hit this problem.
Comment 62•16 years ago
|
||
Doesn't your patch affect where the comment terminates and therefore what Element nodes end up in the document?
Comment 63•16 years ago
|
||
Sorry, yes. I meant that given the testcase |<!-- comment -->|, our resulting DOM looks like: HTML HEAD <!-- comment --> BODY where the tests want <!-- comment --> HTML HEAD BODY and fixing that seems beyond the scope of this bug (unless people say otherwise).
Comment 64•16 years ago
|
||
Sure. I was assuming we'd add the tests from this bug and/or duplicates to parser/htmlparser/tests/mochitest/regressions.txt or some such. At least that's where I've been adding the parser tests... ;)
Comment 65•16 years ago
|
||
I made the state machine a little less jumpy and started adding tests. Unfortunately, the tests I've added here all fail. I don't understand how we're serializing these comments.
Attachment #326714 -
Attachment is obsolete: true
Comment 66•16 years ago
|
||
This adds a bunch of tests, and I'm pretty sure that I've implemented the spec faithfully. This is ready for review.
Attachment #327975 -
Attachment is obsolete: true
Attachment #328128 -
Flags: review?(jonas)
Comment 67•16 years ago
|
||
One thing I've noticed is in the testcase: |<title>foo <!-- bar| the resulting content model is: html head title "foo " <!-- bar --> I'll fix that.
Comment 68•16 years ago
|
||
Here's the interesting part of the interdiff: diff --git a/parser/htmlparser/src/nsHTMLTokens.cpp b/parser/htmlparser/src/nsHTMLTokens.cpp --- a/parser/htmlparser/src/nsHTMLTokens.cpp +++ b/parser/htmlparser/src/nsHTMLTokens.cpp @@ -900,6 +900,10 @@ CTextToken::ConsumeParsedCharacterData(P consumer.AppendSourceTo(theContent.writable()); mNewlineCount += consumer.GetNewlineCount(); + + // If we successfully consumed a comment, end the title after the + // comment. + aScanner.CurrentPosition(altEndPos); continue; } }
Attachment #328128 -
Attachment is obsolete: true
Attachment #329653 -
Flags: review?(jonas)
Attachment #328128 -
Flags: review?(jonas)
Blake, what is the status of this patch? Is it still good to review?
Comment 73•16 years ago
|
||
(In reply to comment #72) > Blake, what is the status of this patch? Is it still good to review? Yeah, the only question that needs answering before review is whether we want to do this at all.
Assuming that this makes us follow the HTML5 algorithm I think we should try it. The only reason not to do it would be if we think the HTML5 parser is going to land pretty soon anyway...
Comment 78•16 years ago
|
||
I was the one opening bug 484036. This bug unfortunately didn't show up when I searched, I'm sorry for that. Just to add to this discussion, putting the url of an IDN-domain between those comment tags triggers this error too. I run several forums based off vBulletin and this application puts the forum's url wrapped in a comment in the footer. One of my forums is using an IDN-domain and vBulletin puts that domain name in punycode format in the footer (www.xn--something-xyz.com) and breaks the page. Not allowing certain valid domains within a comment seems a little far fetched so I hope this problem will get resolved. Other browsers renders this correctly.
Comment 79•16 years ago
|
||
> Not allowing certain valid domains within a comment seems a little far fetched
Er, say what? The two languages are unrelated! You wouldn't complain if a url that contains "*/" ended a CSS comment, would you?
This bug should be fixed for compat, but said compat is just a workaround for people sticking things with the comment end delimiter in them inside comments...
Comment 80•16 years ago
|
||
Not exactly sure what you mean. I was referring to comment #3 where someone claims it's forbidden to have -- enclosed in comment tags. The people inventing punycode and making IDN domains a standard either didn't know this or completely ignored it. In any case it's the users of Firefox that has to pay the price which, after reading this thread, seems to have been forgotten. Standards seem to be more important for Mozilla than user experience. If I have misunderstood this I sincerely apologize.
Comment 81•16 years ago
|
||
> or completely ignored it
The latter. Or assumed people would properly escape their stuff inside comments, of course.
Comment 82•16 years ago
|
||
In reply to comment #80: If you want to put */ inside a C comment without ending the comment, you have to alter it somehow. Add a space between the star and slash, maybe. Similarly, you can't put a punycode URL, containing two dashes in the middle of text, inside an HTML4 comment -- you have to alter it somehow. If you want the punycode URL to be human-readable, you can replace two dashes by four (but the opposite conversion will have to be done when copying it to the URL bar); or if you want it to be machine-readable, I think you can replace the two dashes by %2D%2D (correct me, someone, if I'm wrong in thinking that such an "escaped" URL, when copied to the URL bar, will be correctly interpreted, first by having each %2D replaced by a dash, then by punycode interpretation). _Four_ dashes in sequence are allowed within an HTML comment, even in HTML4; and %2D%2D means -- in URL syntax but not in HTML comment syntax. IIUC, HTML5 comment syntax differs from HTML4 comment syntax; but I don't know the details. The above paragraph is about HTML4 but this bug is (IIUC, now that it has been reopened, assigned, and given the "html5" keyword) about altering Gecko to handle HTML5 comments correctly.
Comment 83•16 years ago
|
||
Your example is not quite correct though. */ is the terminating marker for a C comment - nobody is wanting to put --> inside a HTML comment. The problem is putting just part of the comment terminator inside a comment terminates it. It would be like being unable to use a single * inside a C comment. The same argument applies to comment #79. I wouldn't complain if a URL containing */ ended a CSS comment, but I could complain if a URL containing * ended it. Likewise I'm not complaining that "-->" ends a HTML comment, but rather that "--" does. I'm not aware of any other language that uses one character sequence to start a comment, but two or more different sequences to terminate that same comment.
Comment 84•16 years ago
|
||
You can choose whether you're wrong according to HTML4.01 and SGML: http://htmlhelp.com/reference/wilbur/misc/comment.html or according to HTML5: http://dev.w3.org/html5/spec/Overview.html#comments but unfortunately that's just the way HTML comments work. You can't put -- inside them.
Comment 85•16 years ago
|
||
Adam, in SGML and HTML the comment terminator sequence is "--", and can only be used inside an SGML markup declaration. SGML markup declaration start with "<!" and end with ">". Here's an example from the HTML4 DTD at http://www.w3.org/TR/REC-html40/sgml/dtd.html : <!ATTLIST Q %attrs; -- %coreattrs, %i18n, %events -- cite %URI; #IMPLIED -- URI for source document or msg -- > Here the comments are "%coreattrs, %i18n, %events" and "URI for source document or msg" but the rest of the text is not comment and in fact is the declaration for the allowed attributes on the Q element. I realize this is a bit more complex than the way comments work in C, but that's life. Note that I already said all this in this bug in comment 4, almost 6 years ago... In any case, this bug is now about ignoring HTML4 and implementing the HTML5 definition of comments, which does indeed start with "<!--" and end with "-->", so I'm really not sure what all the discussion is about at this point.
Comment 86•16 years ago
|
||
I've noticed that -- followed by any sequence of characters and a > within a conditional comment really messes things up. Surely a conditional comment needs to be an exception to this rule, as it may contain a script with valid code. I realise conditional comments aren't best practice coding, but they do exist and they do get parsed by browsers. Here's an example: <!--[if lt IE 7]> <script language="javascript"> var i=1; i--; var string=">"; alert("this displays as inline text in firefox"); </script> <![endif]-->
Comment 87•16 years ago
|
||
This is starting to get ridiculous... #86 ... perhaps you can use eval and unescape to avoid having -- and > characters in IE only code?
Comment 88•16 years ago
|
||
Conditional comments are "parsed by browsers" as just comments, with no special treatment, except for IE. Seriously, if you want to shoot yourself in the foot with HTML you can. And I still don't nderstand what the discussion is about, since the plan is to change behavior here... Can people just shut up and let the bug be until it's fixed?
Comment 92•15 years ago
|
||
I submitted duplicate bug 500110 for this, after three failed searches. Boris' comment "... I already said all this ... almost six years ago" (#86) is distressing. Six years! By the time HTML5 is official *and* the browser manufacturers adopt it *and* older browsers like Firefox 3.0.11 disappear from the Internet, I may have retired. I'm in the camp that uses dashes (and others) in comments to visually break my code into sections. A solution that looks good and seems to work well is to replace the non-allowed dashes with character 196 (Alt-196 in Windows).
Comment 96•15 years ago
|
||
but what I am curious about is, are these valid HTML? Am I to understand that a space is required? <!-----> (one - in the middle. in firefox, this is not a usable comment because it has an odd number of -'s.) <!--+--> (one + in the middle, should be same problem as above, but lexically analyzed differently. I am not sure, but I think in firefox this may not be a usable comment.) <!--->->--> (->-> in the middle. in firefox this is likely not be a usable comment because it has 3 -'s.) <!------> (two - in the middle. in firefox, this is a usable comment because it has an even number of -'s.) What I have noticed in the past about the firefox lexer is that it just simply pairs off -'s, which may be the wrong way to lexically analyze them. I was thinking of the following algorithm: char0=0 char1=0 char2=0 match "<!--" in sequence char0=get character while (char0=get character != EOF) { if (char0=='-') { char1=get character if (char1=='-') { char2=get character if (char2=='>') { //found end of comment! break; } } } } but then I realized that this is the logic error that only recognizes double -'s! what I really need here is the software equivelant of the digital electronics shift register. it is similar to a deque, where you push on one side and pop on the other. If I could examine the contents of the whole deque, that would be wonderful. then I could see if the --> was coming down the pipe, and I could keep it always 3 or 4 characters full of characters that I newly got. got it? what you would need to implement this in C++ is something like the STL to do this the easiest way. there is already a deque class. unfortunately, I have already stuffed my STL book in a box for moving. an iterator should provide the necessary means of iterating across the data elements of the deque. and they are easy to make. Jim Michaels
Comment 97•15 years ago
|
||
Well the way I look at it is the tag name is "!--" in the same way as the tag name might be "img". Obviously <imgblah> is not valid (you need a space after the tag name), so I would expect <!-----> to be something completely different to a comment. <!-- - --> would be a comment with an uneven number of dashes.
Comment 98•15 years ago
|
||
Jim, please read comment 85 (and then comment 4). Those explain exactly how SGML and HTML4 work here (and yes, they require simply pairing off "--"). Nte that a comment containing "--" is invalid HTML4 and it says so right in the HTML4 specification. HTML5 changes the specified behavior here, which is why this bug is still (or rather again) open.
Attachment #329653 -
Flags: review?(jonas)
Comment on attachment 329653 [details] [diff] [review] Implement HTML5 comments, v2.7 Hopefully all of this code is going away soon, so no need to muck around with it at this point. If that plan changes please rerequest review
Updated•15 years ago
|
Depends on: html5-parsing
Comment 105•15 years ago
|
||
Not sure about implementation in swallowing, but I think you use the big state machines... while in comments you could break this into three states: IN_COMMENTS: if ( nextchar == "-" ) { state = IN_COMMENTS_ONEDASH; break; } // else swallow the rest; IN_COMMENTS_ONEDASH: if ( nextchar == "-" ) { state = IN_COMMENTS_TWODASH; break; } else { state = IN_COMMENTS; break } IN_COMMENTS_TWODASH: if ( nextchar == ">" ) { state = DATA; // or return to whatever state we were in break; } else if ( nextchar == "-" ) { state = IN_COMMENTS_TWODASH; // cause now we have two dashes again } else { state = IN_COMMENTS; } // maybe some error here because of dashes and no end to comments That should work for the state engines and you can keep track of the errors.
Comment 106•15 years ago
|
||
(In reply to comment #105) http://www.w3.org/TR/2010/WD-html5-20100304/syntax.html#comments (just counting "<!--" for comment block) well, in https://hg.mozilla.org/mozilla-central/file/e9312d05488f/parser/html/javasrc/Tokenizer.java it's already this way, except for the allowing space after the two dashes (not sure if that is whats being used). it breaks from comments in case of <!--> or <!---> which seems fine, but it breaks when <!-- -- >... which in html5 is incorrect if the comment block should end in "-->". If white-space (or "!") is encountered such as in COMMENT_END it should do what it does in "default" and not go to COMMENT_END_SPACE (or BANG) like it does. Other than that and other tokenizers, it appears it should work.
Updated•15 years ago
|
Assignee: mrbkap → nobody
Updated•15 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 21 years ago → 15 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by the HTML5 parser]
Comment 108•15 years ago
|
||
I thought <!-- and --> delimited comments. At least that is how it works with ie. (Yes I understand that ie doesn't follow all standards).
Comment 109•15 years ago
|
||
James, you thought wrong, in the case of HTML4. See comment 4. Note that as a result of IE's behavior and people thinking it's correct the standard got changed...
Also note that this bug has been marked FIXED, which means that this bug is fixed in the code that will become the next version of firefox.
Comment 128•14 years ago
|
||
Given Firefox 4 about to ship with an html5-compliant parser that may lead to even more sites "broken" in Firefox 3.6, which will have a significant number of users for quite some time. It may be worth taking Blake's patch or something like it on the 1.9.2. branch.
You need to log in
before you can comment on or make changes to this bug.
Description
•