Closed Bug 130045 Opened 22 years ago Closed 22 years ago

<![ closes on ]> instead of > (<![]-> causes Mozilla to ignore the rest of the page)

Categories

(Core :: DOM: HTML Parser, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: adam, Assigned: hjtoi-bugzilla)

References

()

Details

(Keywords: compat)

Attachments

(3 files, 1 obsolete file)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011221
BuildID:    2001122108

<![endif]--> in an HTML file causes the rest of the file to be ignored. If you
do "View Source", it's as if the rest of the page wasn't there. This is a
serious problem because Windows IE considers this some sort of logical
construct, and there are tools which generate this code. It's not actually the
word "endif" that Mozilla is objecting to... <![]-> also has the same effect.
Yes, this is broken HTML, but Mozilla should be able to ignore it just like it
ignores all the other broken HTML out there.

Please note the example URL is a real page, and hence may be fixed before you
see it.

Reproducible: Always
Steps to Reproduce:
There isn't an attachment box on this simple bug page, so I'll just have to cut
n' paste in my test code. This is based on real code, so it's not a completely
minimal example.

Before.
<!--[if IE]>
<script><!--
document.write("During. ");
// --></script>
<![endif]-->
After.


Actual Results:  Mozilla displays: Before.


Expected Results:  Netscape 4.7 (Mac) displays: Before. After.
Konqueror displays: Before. After.
Opera (Linux) displays: Before. After.
IE 5 (Mac) displays: Before. After.
IE 6 (Windows) displays: Before. During. After.
lynx displays: Before. After.
Attached file Test case.
There's bug 91561 which is resolved invalid..
I guess this is invalid too, but wonder why "view source" doenst work is the
testcase.
Confirming.  The page is working fine now.   In the small testcase the problem
is the "-->" at the end of the <script> which ends the comment that that <!--[if
IE]>  started.  So the <![endif]--> starts a new comment, as it should.

Quirks mode comment parsing sucks....
Status: UNCONFIRMED → NEW
Ever confirmed: true
This version includes the HTML declaration to turn off quirks mode. It also
uses the 5-character <![]-> tag that shouldn't be mistaken for the start of the
comment. Mozilla still ignores the rest of the file.
As a note, removing the "[]" makes things fine.  Is this some sort of odd
interaction of this SGML with CDATA parsing?
Summary: <![endif]--> causes Mozilla to ignore the rest of the page → <![]-> causes Mozilla to ignore the rest of the page
On a quick look at the code, it seems that Mozilla currently eats anything that 
looks like a marked section: <![  ...  ]]>. cf bug 10512
HTML 2.0 says that:
A comment declaration consists of `<!' followed by zero or more comments
followed by `>'. Each comment starts with `--' and includes all text up to and
including the next occurrence of `--'. In a comment declaration, white space is
allowed after each comment, but not before the first comment. The entire comment
declaration is ignored.
<!keyword ...> is something else that should be processed; ex: <!DOCTYPE ...>

And since HTML 3.2:
<![keyword[ ... ]]> is a marked section

HTML 4.01 says that:
White space is not permitted between the markup declaration open delimiter("<!")
and the comment open delimiter ("--"), but is permitted between the comment
close delimiter ("--") and the markup declaration close delimiter (">"). A
common error is to include a string of hyphens ("---") within a comment. Authors
should avoid putting two or more adjacent hyphens inside comments.
As far as I can tell, <![...]> cannot be a marked section because it doesn't
follow the form <![keyword[ ... ]]> since I can't see how either the ] or >
characters could be a valid keyword.

Conclusion: <![...]> is not a valid <!keyword ...> and also cannot be a marked
section, as shown above.  Since HTML 4.01 says <![...]> is invalid, but doesn't
say what to do about it, I propose to fall back on the HTML 2.0 standard.
Therefore, <![...]> should not be processed as anything more than a comment
declaration (and therefore be ignored) that terminiates at the first >.
<![ should not be enough for Mozilla to assume a marked section.  <![...[ should
be the indicator.  At any rate, all markup (tags) should be closed at the first
> found, including the <! markup.  This handling would solve both problems with
the original testcase.
Keywords: mozilla1.0
*** Bug 91561 has been marked as a duplicate of this bug. ***
Adding stuff to summary which explains the overall parser problem.
Keywords: compat
Summary: <![]-> causes Mozilla to ignore the rest of the page → Comments (<!) do not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page)
"<!" does not mean "comment".  It means "SGML markup".  Inside that "--" is a
comment start and "--" is a comment end.

As long as HTML is claiming to be an SGML application, it has no say over how
anything following "<!" is parsed -- that is the province of the SGML specification.
Yes, I should have said comment declaration as indicated in HTML 2.0.

Anyway, even in SGML, shouldn't <... be immediately closed upon the occurrence
of >    ?
<! included.
Summary: Comments (<!) do not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page) → <! does not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page)
The summary is not 100% correct... <![]> doesn't cause any problems for me.
<![]-> is the minimal one I've found.

From my testing, it appears that Mozilla considers "<![" to be a construct that
can only be closed by  "]>". The View Source window doesn't display unclosed tags.
Sorry, your comment about the "5-character tag" threw me off.
I'd put something along the lines of a Perl regexp, but I'm not sure if
everybody would get the meaning.
ie <![.*].+>
Summary: <! does not necessarily close on > (<![]> causes Mozilla to ignore the rest of the page) → <! does not necessarily close on > (<![]-> causes Mozilla to ignore the rest of the page)
Actually, it would be more of m/<!\[[^\]]*\][^\]]+>/ but thats really confusing.

As it stands now, Mozilla appears to do this:
1. <!-- closes at the next --> if one exists anywhere after it.  Otherwise it
closes at the next >
2. <![ closes only at the next ]>
3. <!(anything else) closes at the next >

However, why should there be three cases?  It seems that there should only be
the third.  The constructs are all the same tag: <! it should not matter what
comes directly after it as to what gets rendered.  Unlike all other HTML tags,
this tag is unique in that it should be executed after <!(no space) instead of
<keyword(space) AFAICT.
Okay, that was wrong.  I just realized that <!-- is used to comment other HTML
(duh), which of course will contain a >
So we really do have two cases:
<!-- and <!
We just need to get rid of <![
Well, really it's just one case with one sub-case.
<! closes on >
BUT
-- disables everything (including >) until the next --

I could see the potential for
[ "disables" everything until the next ]
but I'm not convinced that that is correct.

Either way, there is still here problem here.  We have either too many or the
wrong cases.
Summary: <! does not necessarily close on > (<![]-> causes Mozilla to ignore the rest of the page) → <![ closes on ]> instead of > (<![]-> causes Mozilla to ignore the rest of the page)
Adding Heikki since he worked on bug 110545.
Attached patch Possible fix (obsolete) — Splinter Review
With this patch I see roughly this:

<![endif]--> test case
Before. After.
Adam Rice
Last modified: Mon Mar 11 12:51:26 GMT 2002

We need a bit of testing but I think we could fix this quite safely. Harish?
This patch can handle stuff like <![ foobar [ hello ]] -- agjdgfjh >.
Stuff that goes into node:	   ^-----------------^

This code will be triggered ONLY if we see: <![
Attachment #75508 - Attachment is obsolete: true
Comment on attachment 75602 [details] [diff] [review]
Better and safer fix

r=harishd
Attachment #75602 - Flags: review+
Taking since I have a (partial) fix. Stuff that starts with "<!--" is going to
be handled by the normal comment handling code, and it will close when it sees
the first "->", or if it cannot find that then on the first ">". This means that
even with the fix here we still treat 

<!--[if IE]>
<script><!--
document.write("During. ");
// -->

as single comment, and I don't think we should try to fix that problem at this
point. I'll open a new bug to make comments close with "]>".With this patch we
get MOST of the page right.

Adamn, everyone: which tools produce these kinds of things? We need to
evangelize them.
Assignee: harishd → heikki
Priority: -- → P2
Whiteboard: [fixinhand]
Target Milestone: --- → mozilla1.0
> which tools produce these kinds of things?

Microsoft Word, mostly ("Save as HTML").  Newer versions of FrontPage may, but
I'm not sure.

See http://msdn.microsoft.com/workshop/author/dhtml/overview/ccomment_ovw.asp
for Microsoft's description of this "feature" of IE....
Aren't we going to mess up pages that legitimately have a > inside their marked
sections? See http://www.w3.org/TR/html401/appendix/notes.html#h-B.3.5 for
information about SGML marked sections.
No, because we look for ']' to close the CDATA section, and then just skip
everything until we find '>' from there. So you can still have:

<![foo>bar]>
Status: NEW → ASSIGNED
Comment on attachment 75602 [details] [diff] [review]
Better and safer fix

There's still a legitimate case for a marked section that will be broken, but
Heikki convinced me that it is a rare and unlikely case in HTML content.
sr=vidur
Attachment #75602 - Flags: superreview+
Comment on attachment 75602 [details] [diff] [review]
Better and safer fix

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #75602 - Flags: approval+
Forgot to close, this was fixed:

03/22/2002 14:58 heikki%netscape.com mozilla/htmlparser/src/nsHTMLTokens.cpp 3.214
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: [fixinhand]
verified URL and testcase, fix checked in CVS (rev 3.214)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: