Closed
Bug 130045
Opened 22 years ago
Closed 22 years ago
<![ closes on ]> instead of > (<![]-> causes Mozilla to ignore the rest of the page)
Categories
(Core :: DOM: HTML Parser, defect, P2)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: adam, Assigned: hjtoi-bugzilla)
References
()
Details
(Keywords: compat)
Attachments
(3 files, 1 obsolete file)
516 bytes,
text/html
|
Details | |
477 bytes,
text/html
|
Details | |
1.09 KB,
patch
|
harishd
:
review+
vidur
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011221 BuildID: 2001122108 <![endif]--> in an HTML file causes the rest of the file to be ignored. If you do "View Source", it's as if the rest of the page wasn't there. This is a serious problem because Windows IE considers this some sort of logical construct, and there are tools which generate this code. It's not actually the word "endif" that Mozilla is objecting to... <![]-> also has the same effect. Yes, this is broken HTML, but Mozilla should be able to ignore it just like it ignores all the other broken HTML out there. Please note the example URL is a real page, and hence may be fixed before you see it. Reproducible: Always Steps to Reproduce: There isn't an attachment box on this simple bug page, so I'll just have to cut n' paste in my test code. This is based on real code, so it's not a completely minimal example. Before. <!--[if IE]> <script><!-- document.write("During. "); // --></script> <![endif]--> After. Actual Results: Mozilla displays: Before. Expected Results: Netscape 4.7 (Mac) displays: Before. After. Konqueror displays: Before. After. Opera (Linux) displays: Before. After. IE 5 (Mac) displays: Before. After. IE 6 (Windows) displays: Before. During. After. lynx displays: Before. After.
Comment 2•22 years ago
|
||
There's bug 91561 which is resolved invalid.. I guess this is invalid too, but wonder why "view source" doenst work is the testcase.
Comment 3•22 years ago
|
||
Confirming. The page is working fine now. In the small testcase the problem is the "-->" at the end of the <script> which ends the comment that that <!--[if IE]> started. So the <![endif]--> starts a new comment, as it should. Quirks mode comment parsing sucks....
Status: UNCONFIRMED → NEW
Ever confirmed: true
This version includes the HTML declaration to turn off quirks mode. It also uses the 5-character <![]-> tag that shouldn't be mistaken for the start of the comment. Mozilla still ignores the rest of the file.
Comment 5•22 years ago
|
||
As a note, removing the "[]" makes things fine. Is this some sort of odd interaction of this SGML with CDATA parsing?
Summary: <![endif]--> causes Mozilla to ignore the rest of the page → <![]-> causes Mozilla to ignore the rest of the page
Comment 6•22 years ago
|
||
On a quick look at the code, it seems that Mozilla currently eats anything that looks like a marked section: <![ ... ]]>. cf bug 10512
Comment 7•22 years ago
|
||
HTML 2.0 says that:
A comment declaration consists of `<!' followed by zero or more comments
followed by `>'. Each comment starts with `--' and includes all text up to and
including the next occurrence of `--'. In a comment declaration, white space is
allowed after each comment, but not before the first comment. The entire comment
declaration is ignored.
<!keyword ...> is something else that should be processed; ex: <!DOCTYPE ...>
And since HTML 3.2:
<![keyword[ ... ]]> is a marked section
HTML 4.01 says that:
White space is not permitted between the markup declaration open delimiter("<!")
and the comment open delimiter ("--"), but is permitted between the comment
close delimiter ("--") and the markup declaration close delimiter (">"). A
common error is to include a string of hyphens ("---") within a comment. Authors
should avoid putting two or more adjacent hyphens inside comments.
As far as I can tell, <![...]> cannot be a marked section because it doesn't
follow the form <![keyword[ ... ]]> since I can't see how either the ] or >
characters could be a valid keyword.
Conclusion: <![...]> is not a valid <!keyword ...> and also cannot be a marked
section, as shown above. Since HTML 4.01 says <![...]> is invalid, but doesn't
say what to do about it, I propose to fall back on the HTML 2.0 standard.
Therefore, <![...]> should not be processed as anything more than a comment
declaration (and therefore be ignored) that terminiates at the first >.
<![ should not be enough for Mozilla to assume a marked section. <![...[ should
be the indicator. At any rate, all markup (tags) should be closed at the first
> found, including the <! markup. This handling would solve both problems with
the original testcase.
Keywords: mozilla1.0
Comment 9•22 years ago
|
||
Adding stuff to summary which explains the overall parser problem.
Keywords: compat
Summary: <![]-> causes Mozilla to ignore the rest of the page → Comments (<!) do not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page)
Comment 10•22 years ago
|
||
"<!" does not mean "comment". It means "SGML markup". Inside that "--" is a comment start and "--" is a comment end. As long as HTML is claiming to be an SGML application, it has no say over how anything following "<!" is parsed -- that is the province of the SGML specification.
Comment 11•22 years ago
|
||
Yes, I should have said comment declaration as indicated in HTML 2.0. Anyway, even in SGML, shouldn't <... be immediately closed upon the occurrence of > ? <! included.
Summary: Comments (<!) do not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page) → <! does not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page)
Reporter | ||
Comment 12•22 years ago
|
||
The summary is not 100% correct... <![]> doesn't cause any problems for me. <![]-> is the minimal one I've found. From my testing, it appears that Mozilla considers "<![" to be a construct that can only be closed by "]>". The View Source window doesn't display unclosed tags.
Comment 13•22 years ago
|
||
Sorry, your comment about the "5-character tag" threw me off. I'd put something along the lines of a Perl regexp, but I'm not sure if everybody would get the meaning. ie <![.*].+>
Summary: <! does not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page) → <! does not necessarily close on > (<![]-> causes Mozilla to ignore the rest of the page)
Comment 14•22 years ago
|
||
Actually, it would be more of m/<!\[[^\]]*\][^\]]+>/ but thats really confusing. As it stands now, Mozilla appears to do this: 1. <!-- closes at the next --> if one exists anywhere after it. Otherwise it closes at the next > 2. <![ closes only at the next ]> 3. <!(anything else) closes at the next > However, why should there be three cases? It seems that there should only be the third. The constructs are all the same tag: <! it should not matter what comes directly after it as to what gets rendered. Unlike all other HTML tags, this tag is unique in that it should be executed after <!(no space) instead of <keyword(space) AFAICT.
Comment 15•22 years ago
|
||
Okay, that was wrong. I just realized that <!-- is used to comment other HTML (duh), which of course will contain a > So we really do have two cases: <!-- and <! We just need to get rid of <![
Comment 16•22 years ago
|
||
Well, really it's just one case with one sub-case. <! closes on > BUT -- disables everything (including >) until the next -- I could see the potential for [ "disables" everything until the next ] but I'm not convinced that that is correct. Either way, there is still here problem here. We have either too many or the wrong cases.
Summary: <! does not necessarily close on > (<![]-> causes Mozilla to ignore the rest of the page) → <![ closes on ]> instead of > (<![]-> causes Mozilla to ignore the rest of the page)
Comment 17•22 years ago
|
||
Adding Heikki since he worked on bug 110545.
Assignee | ||
Comment 18•22 years ago
|
||
With this patch I see roughly this: <![endif]--> test case Before. After. Adam Rice Last modified: Mon Mar 11 12:51:26 GMT 2002 We need a bit of testing but I think we could fix this quite safely. Harish?
Assignee | ||
Comment 19•22 years ago
|
||
This patch can handle stuff like <![ foobar [ hello ]] -- agjdgfjh >. Stuff that goes into node: ^-----------------^ This code will be triggered ONLY if we see: <![
Attachment #75508 -
Attachment is obsolete: true
Comment 20•22 years ago
|
||
Comment on attachment 75602 [details] [diff] [review] Better and safer fix r=harishd
Attachment #75602 -
Flags: review+
Assignee | ||
Comment 21•22 years ago
|
||
Taking since I have a (partial) fix. Stuff that starts with "<!--" is going to be handled by the normal comment handling code, and it will close when it sees the first "->", or if it cannot find that then on the first ">". This means that even with the fix here we still treat <!--[if IE]> <script><!-- document.write("During. "); // --> as single comment, and I don't think we should try to fix that problem at this point. I'll open a new bug to make comments close with "]>".With this patch we get MOST of the page right. Adamn, everyone: which tools produce these kinds of things? We need to evangelize them.
Assignee: harishd → heikki
Priority: -- → P2
Whiteboard: [fixinhand]
Target Milestone: --- → mozilla1.0
Comment 22•22 years ago
|
||
> which tools produce these kinds of things? Microsoft Word, mostly ("Save as HTML"). Newer versions of FrontPage may, but I'm not sure. See http://msdn.microsoft.com/workshop/author/dhtml/overview/ccomment_ovw.asp for Microsoft's description of this "feature" of IE....
Comment 23•22 years ago
|
||
Aren't we going to mess up pages that legitimately have a > inside their marked sections? See http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.5 for information about SGML marked sections.
Assignee | ||
Comment 24•22 years ago
|
||
No, because we look for ']' to close the CDATA section, and then just skip everything until we find '>' from there. So you can still have: <![foo>bar]>
Status: NEW → ASSIGNED
Comment 25•22 years ago
|
||
Comment on attachment 75602 [details] [diff] [review] Better and safer fix There's still a legitimate case for a marked section that will be broken, but Heikki convinced me that it is a rare and unlikely case in HTML content. sr=vidur
Attachment #75602 -
Flags: superreview+
Comment 26•22 years ago
|
||
Comment on attachment 75602 [details] [diff] [review] Better and safer fix a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #75602 -
Flags: approval+
Assignee | ||
Comment 27•22 years ago
|
||
Forgot to close, this was fixed: 03/22/2002 14:58 heikki%netscape.com mozilla/htmlparser/src/nsHTMLTokens.cpp 3.214
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand]
Comment 28•22 years ago
|
||
> which tools produce these kinds of things? FYI: http://msdn.microsoft.com/library/default.asp?url=/workshop/author/dhtml/overview/ccomment_ovw.asp
Comment 29•22 years ago
|
||
verified URL and testcase, fix checked in CVS (rev 3.214)
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•