Closed Bug 327796 Opened 18 years ago Closed 18 years ago

Scripts in text/html documents closed with XML-style /> empty-tag markers that have src attribute should be allowed to run

Categories

(Core :: DOM: HTML Parser, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

()

VERIFIED WONTFIX

People

(Reporter: jruderman, Assigned: mrbkap)

References

Details

(Keywords: regression, testcase)

Attachments

(1 file)

The fix for bug 305873 broke my Thumbs extension.  The extension has an XHTML-as-HTML page that includes a lot of scripts at the bottom, like this:

<script type="application/x-javascript" src="readpref.js" />
<script type="application/x-javascript" src="native.js" />
<script type="application/x-javascript" src="referrerFixer.js" />
<script type="application/x-javascript" src="thumbs.js" />

I don't see a reason for the unclosed script ignoring to apply to scripts with src, as long as the open tag (at least the src attribute) is complete.  I'll fix Thumbs to work around this change (by making up my mind as to whether the page is HTML or XHTML), but I'm filing this bug because Gecko might be breaking other similar pages and doesn't have to.
Attached file testcase
The following makes me think it could be hard to fix this bug safely:

data:text/html,<hr color="red
I don't see how we could possibly fix this without reverting bug 305873. The first script element is never closed. So nothing that comes after it should be considered markup (it's all contents of the script). And since the element itself is malformed we don't want to execute it either.
Couldn't we use the new behavior for <script> with src, and the old behavior or something like it for <script> with src?
ugh, i don't like the idea of having attribute values deciding how we interpret element boundries
What we *could* do. If we really really wanted to make this work. Is to support that ending / a'la xml and treat it as an empty tag.

I'm not real happy with that though.
Why do we want to do this at all?

IMHO an unclosed <script> tag should never execute, whether src="" or not. This matches what IE does.

Executing an unclosed <script> tag can be a security problem (because it can make the site's behaviour change in particular ways when DOS attacked).
I'm all for marking this one INVALID. I guess it's just a matter of how much of the web will break. FWIW safari added 'support' for the <script/> syntax for "compat with firefox".
I didn't realize that our current behavior matched IE's.  I'm fine with WONTFIX in that case.
making it so
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WONTFIX
Ahem? I don't understand the reasoning given in this thread. You often refer to "unclosed" tags, but an XML tag with a / at the end is not an unclosed tag! I don't understand why you all seem to think that something following a *slashed* tag should be parsed as the *contents* of that tag, when in fact a slashed tag is by definition one without contents?

If the slash at the end of the tag was missing, then I would agree, but this way, it seems to me that your XHTML parser is seriously broken. Can someone enlighten me here?
In bug 420873, matti wrote:

> you have to close the <script> and closing it with <script     /> is not valid.

Why? I disagree; at least according to my understanding of XML, <script /> is supposed to be exactly equivalent to <script></script>.
> but an XML tag with a / at the end is not an unclosed tag!

In XML.  In HTML, it's an unclosed tag.

> it seems to me that your XHTML parser is seriously broken

Your testcase in bug 420873 is not using the XML parser; it's using the HTML parser.  It's also invalid XHTML because it uses the text/html MIME type without complying with Appendix C of the XHTML 1.0 specification.  As it happens, if it did comply there would be no issue.

And yes, I know your testcase passed the W3C validator.  Sadly, the validators are not very good.  The tend to only detect errors that violate the DTD, not errors that violate normative specification prose.
Isn't Jesse's original post about an XHTML page, though?  Are we sure we're doing this write for XHTML pages?  (which -should- accept the XML-style closing-mark, right?)  Also, would it be terribly dangerous or difficult to favor Postel's law in this case, and allow XML-closed <script src= ... /> tags, even for unspecified or HTML DOCTYPEs?
(In reply to comment #16)
> Are we sure we're doing this write for XHTML pages?

right, not write.  My English is bitrotting lately.
Also, if firefox 2 accepted this style (as indicated by bug 420873, comment 0), and firefox 3 doesn't, aren't we introducing a regression of sorts, even if in fact we were fixing a bug?
> Isn't Jesse's original post about an XHTML page, though?

No.  It's about an "XHTML-as-HTML" page.

> Also, would it be terribly dangerous or difficult

Possibly both, yes.  See Hixie's and sicking's comments in this bug.

> aren't we introducing a regression of sorts

Yes.  We're fixing a security bug in the process, and restoring IE compat.  The tradeoff is well worth it.
> Your testcase in bug 420873 is not using the XML parser; it's using the HTML
> parser.

That would be a problem -- I recommend that if the DOCTYPE clearly declares the document as XHTML, the XML parser should definitely take over.

> It's also invalid XHTML because it uses the text/html MIME type
> without complying with Appendix C of the XHTML 1.0 specification.

This is incorrect. Appendix C is explicitly marked as informative.

> Possibly both, yes.  See Hixie's and sicking's comments in this bug.

Neither of those comments addresses the special case of a tag with the XML-style "/>" closing syntax. The security problems mentioned do not exist in this case because they only apply to JavaScript code inside the tag, while the XML-style closed tag does not have any contents inside.

> We're fixing a security bug in the process, and restoring IE compat.
> The tradeoff is well worth it.

I do not agree with the last statement. You are not "restoring IE compat", you are merely copying an IE bug. Fine if an unclosed tag shouldn't execute - I agree. Not fine if a closed tag is treated as if it was unclosed and contained the entire rest of the XHTML source file. That's a bug.
> That would be a problem -- I recommend that if the DOCTYPE clearly declares

Most text/html sites that have an XHTML doctype are not well-formed XML.  So this is a non-starter.  Not to mention that this would violate the HTML4 and XHTML 1.0 specs.

> This is incorrect. Appendix C is explicitly marked as informative.

While true (because not all XHTML documents need to follow Appendix C), section 5.1 of XHTML 1.0 is normative.  I suggest you read it.  It clearly says that you may only label XHTML as text/html if it follows Appendix C.

> while the XML-style closed tag 

It's not closed.  That's the whole point.

> Fine if an unclosed tag shouldn't execute

That's the IE compat.  That's not what we used to do: we used to execute unclosed tags (like the one in your example).
And I'm not sure why we're still having discussions like this.  XHTML-as-text/html has beren well-documented for years in various tutotials, bug reports, mailing lists, etc.  There's no mystery to how it works if someone actually bothers to look it up instead of just assuming that it's somehow magically XML in spite of the MIME type, which clearly says it's not.
> > while the XML-style closed tag 
> It's not closed.  That's the whole point.

Is this according to something in the HTML 4 spec?

> if someone actually bothers to look it up instead of just assuming

Someone who makes an assumption will not look it up. Not because they're lazy or stupid, but because it doesn't occur to them that the assumption could be wrong. If many people make the assumption, then the technology is unintuitive and, in the ideal case, should be enhanced to accommodate the intuition. Computers should serve humans, not the other way around.
> Is this according to something in the HTML 4 spec?

Yes.  In particular, the definition of <script> as not being an empty element.

> then the technology is unintuitive

Yep.  XHTML is a major mess that way.  I suggest having a look at http://hixie.ch/advocacy/xhtml for some of the issues caused by the many unintuitive behaviors of XHTML-as-text/html.

The way to change this is by getting the specs changed, though, not by having UAs violate the specs piecemeal.
Marking this VERIFIED. We do what the spec says, what IE does, what is most secure, and what makes most sense compared to all other *HTML* parsing. There really is no debate here.

If you HTML parsing rules changed, or XHRML vs. HTML parsing detection changed, feel free to bring it up with the HTML or WHATWG working group to get the HTML5 spec changed.

Until the spec is changed I see no reason to go back on this.
Status: RESOLVED → VERIFIED
Sorry, the middle paragraph had one too many mistakes to be legible, it should say:

If you want HTML parsing rules changed, or XHTML vs. HTML parsing detection changed, feel free to bring it up with the HTML or WHATWG working groups to get the HTML5 spec changed.
I make the computers should serve people argument, but only where there is a relavitely simple and unambiguous algorithm for tweaking the system. Ignoring a missing </script> in XHTML-as-text/html is neither simple nor unambiguous (the error could be a security exploit in action), and hazardous beside.

XML is user-hostile, but asking users to close <script> tags, which I created in 1995, is not asking too much. <script>'s content model is CDATA in HTML, which means only </script> (in the original implementation, at any rate) closes the container. Guessing that premature end of document means silent auto-close is a bad idea.

/be
Though I've come to agree with bz and others in this bug, no one is asking that <script src=> by itself without any hint of a close-marker should be executed.  The idea here was to allow
<script src=... /> to execute the code referenced by the src= attribute.  I have since become convinced that that is difficult enough to be not worthwhile for Fx3.  If it were easy, low-hanging fruit, I might still advocate for it.
Summary: Unclosed scripts with src attribute should be allowed to run → scripts in HTML documents closed with XML-style /> empty-tag markers that have src attribute should be allowed to run
And for what it's worth, yes we are sure we are doing this correctly for XHTML.
crowder: comment 28 is confusing, but possibly because you didn't know that in HTML error correcting mode, the /> at the end of that <script src=.../> tag does not matter *in text/html*. Loading XHTML as application/xml+xhtml is, of course, different.

/be
Yeah, I realize that the /> marker doesn't matter in text/html, I think what I was proposing was to have it be accepted quirkily, OR, to have the src= attribute always be executed, even if the tag's contents themselves aren't.  It seems that script tags with src attributes ignore their contents anyway.  The problem is (not for your illumination, Brendan, but for others curious about this bug, as I was), as was pointed out to me on IRC:

<script src="foo"/>
<script src="bar"/>
<script src="baz"/>

Causes the following:  we begin parsing <script src="foo"> and then consuming text looking for an end tag, which means the "bar" and "baz" including tags will be gobbled and ignored (because we ignore script contents always, in the presence of src="").  It still seems to me that we could special-case the script tag to always be closable by />, but since that is non-compliant with the standard (and the behavior of IE), I think resistance to it is reasonable.  I'm cool with the current behavior.

That said, I can see why other posters think they've provided enough information for our parser to consider the tag closed (even if it were a quirk)
Summary: scripts in HTML documents closed with XML-style /> empty-tag markers that have src attribute should be allowed to run → Scripts in text/html documents closed with XML-style /> empty-tag markers that have src attribute should be allowed to run
> > but an XML tag with a / at the end is not an unclosed tag!
> 
> In XML.  In HTML, it's an unclosed tag.
> 
> > it seems to me that your XHTML parser is seriously broken
> 
> Your testcase in bug 420873 is not using the XML parser; it's using the HTML
> parser.  It's also invalid XHTML because it uses the text/html MIME type
> without complying with Appendix C of the XHTML 1.0 specification.  As it
> happens, if it did comply there would be no issue.
> 
> And yes, I know your testcase passed the W3C validator.  Sadly, the validators
> are not very good.  The tend to only detect errors that violate the DTD, not
> errors that violate normative specification prose.
> 

The /> marker is perfectly valid for XHTML.  The reference in the spec to Appendix C is to make your pages parsable html only browsers as well.

Appendix C specifically states that it does not define how html aware browsers should parse html.  Claiming that it tells you that you should ignore valid xhtml when presented with it is just plain wrong.(In reply to comment #15)

It is unfortunate that it is not easy to fix but that doesn't change that fact that an xhtml aware browser should not be doing what you are currently doing.
> The reference in the spec to Appendix C is to make your pages parsable html
> only browsers as well.

The XHTML spec says that XHTML may be served as text/html only if it complies with Appendix C.

> Appendix C specifically states that it does not define how html aware browsers
> should parse html.

Indeed.

> Claiming that it tells you that you should ignore valid xhtml

It does no such thing, of course.  The text/html MIME type registration says that the content is HTML.  The XHTML spec says you're allowed to claim your XHTML is HTML if you comply with Appendix C.  That's for your own good, since an HTML parser will be used if you claim you're sending HTML.

Again.  You tell us you're sending HTML, not XHTML.  Your doing so is a violation of the XHTML specification, since your content is XHTML that doesn't comply with Appendix C, but all that is really quite irrelevant to us.  We look at the MIME type you send and treat your content as HTML.  We are NOT going to start sniffing MIME types here: that way lies madness and a _lot_ of security issues.

If you happened to follow the XHTML specification, you wouldn't have a problem. But you don't, so you run into issues.  Please stop blaming that problem on someone else.
(In reply to comment #34)
> 
> It does no such thing, of course.  The text/html MIME type registration says
> that the content is HTML.  The XHTML spec says you're allowed to claim your
> XHTML is HTML if you comply with Appendix C.  That's for your own good, since
> an HTML parser will be used if you claim you're sending HTML.
> 
> Again.  You tell us you're sending HTML, not XHTML.  Your doing so is a
> violation of the XHTML specification, since your content is XHTML that doesn't
> comply with Appendix C, but all that is really quite irrelevant to us.  We look
> at the MIME type you send and treat your content as HTML.  We are NOT going to
> start sniffing MIME types here: that way lies madness and a _lot_ of security
> issues.
> 
> If you happened to follow the XHTML specification, you wouldn't have a problem.
> But you don't, so you run into issues.  Please stop blaming that problem on
> someone else.
> 

I certainly did fix my code to be valid html.  However, "text/html" is a valid mime type for xhtml.  Failing to parse it as such is a bug and a violation of the xhtml spec.

The xhtml 1.0 spec does not list a single required mime type.  It gives two options in section 5 and tells you to use text html with the reccomendations in appendix C for backwards compatibility.  It uses the word "may" which as defined in section 2 means optional (except in section 3) and should not be read as "may only"  The term "must" or "shall" would be used in that case.  In either case appendix C is informative, not normitive.  Failure to follow the guidelines in appendix C results in invalid html.  It says nothing about xhtml so browsers supporting xhtml should be unaffected.

The requirements for xhtml are defined in section 3 of the spec.  For a browser to properly parse xhtml, it must look for these conditions in either of the two valid mime-types.

As long as a document follows every requirement in the normitive sections, it is valid XHTML.  Even if it violates every suggestion in the informative sections an xhtml compliant browser must render it properly.
> However, "text/html" is a valid mime type for xhtml.

Only if the XHTML complies with Appendix C.  The precise text is:

  XHTML Documents which follow the guidelines set forth in Appendix C, "HTML
  Compatibility Guidelines" may be labeled with the Internet Media Type
  "text/html" [RFC2854],

Note that that RFC defines the text/html type.  The informative note linked to (http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020801/) clarifies things in terms of the XHTML WG's thinking.  It's informative, but it spells out what the WG had in mind:

  The use of 'text/html' for XHTML SHOULD be limited for the purpose of
  rendering on existing HTML user agents, and SHOULD be limited to [XHTML1]
  documents which follow the HTML Compatibility Guidelines. In particular,
  'text/html' is NOT suitable for XHTML Family document types that adds
  elements and attributes from foreign namespaces, such as XHTML+MathML
  [XHTML+MathML].

  XHTML documents served as 'text/html' will not be processed as XML [XML10],
  e.g. well-formedness errors may not be detected by user agents. Also be aware
  that HTML rules will be applied for DOM and style sheets (see C.11 and C13 of
  [XHTML1] respectively).

> As long as a document follows every requirement in the normitive sections, it
> is valid XHTML.

It is, but it's not being sent as XHTML.  It's being sent as HTML, and hence treated as such, per the normative requirements of RFC 2854.
Just dont display all following of the /> is very bad.
Much much websites wont be displayed!

Is there any chance to give the user an "error" message?
> > As long as a document follows every requirement in the normitive sections, it
> > is valid XHTML.
> 
> It is, but it's not being sent as XHTML.  It's being sent as HTML, and hence
> treated as such, per the normative requirements of RFC 2854.
> 

RFC 2854 is not broken into normative and informative sections since it is an rfc rather than a standard recommendation or definition.  However section 5 says that you should look at the doctype when deciding to parse it as xhtml.  It also unhelpfully comments that doctype declarations are often wrong.

So the issue is really if you should break because someone tells you in the doctype that it is xhtml 1.0 and it is really html 4.0 or if you should break if someone tells you it is xhtml 1.0 in the doctype and it is strict xhtml 1.0 rather than html 4.0.

It appears that firefox does not support xhtml as text/html no matter what the rfc or specs state.

Correctness would indicate that you should try the doctype stated before falling back to html 4.0 if it isn't valid.  Of course, I understand that speed would be horrible if you rendered it twice for every incorrect doctype page out there.

However it would be expected that we don't have regressions between versions like this bug, no matter what bugs might exist in other browsers.
> It also unhelpfully comments that doctype declarations are often wrong.

Indeed.  Most text/html content sent with an XHTML doctype is not well-formed XML.

> It appears that firefox does not support xhtml as text/html no matter what the
> rfc or specs state.

text/html is treated as HTML.  Period.  By all browsers.  That's what I've been saying all along.  That's the right thing to do per the specs, and the only thing to do that's compatible with actual web content.
Tim, comment 24 already cited

http://hixie.ch/advocacy/xhtml

It's worth a read. What IE did set a de-facto standard and predictably poisoned the well for XHTML via text/html. There are hard lessons here for standardization efforts. From the w3c discussions over the last decade, especially since the rise of the WHAT-WG, people have learned.

/be
I just the duplicate noted above.  What's I don't see mentioned here though is that this test/html vs text/xhtml+xml distinction is evidently not being made on the Linux version (I was running Windows XP). The developer sitting next to me has no problem with javascript loading in the examples I gave. I believe she's under Ubuntu Gibbon in VMWare.
Let's regenerate that:

I just entred the duplicate bug noted above.  What I don't see mentioned in the comments is that this test/html vs text/xhtml+xml distinction is evidently not getting made in the Linux builds (I was running under Windows XP). The developer sitting next to me has no problem with javascript loading in the examples I gave in my bug (<script .../>). I believe she's running under Ubuntu Gibbon in VMWare.

Also, it was stated in comment #41 was that all browsers treat text/html as HTML, but yet Opera DOES load the javascript specified in these <script ... src=.../> elements.
Jeff, there is no behavior difference with this code between Windows and Linux, as long as the MIME type is the same.  If you're loading from local disk, of course, the MIME type might be different depending on your OS configuration.

As for Opera, what it does is actually a security bug in Opera.
Hi,

What your saying makes sense, but that's not what appears to be happening. I have one box running Mozilla 5.0 under Ubuntu and another under Windows XP [I notice now that the Firefox versions ARE DIFFERENT as well, so that may be the issue]. When a specific page is loaded (index.html) that page is rendered correctly (or incorrectly, depending on POV) under Linux but not under Windows (scripts execute in former, not latter).

If I look at "tools/page info" in Firefox, both pages report Type as text/html.

Windows info:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.3) Gecko/2008092417 Firefox/3.0.3

Ubuntu info:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.14) Gecko/20080418 Ubuntu/7.10 (gutsy) Firefox/2.0.0.14
Jeff, that's happening because your Ubuntu machine has an ancient version of Firefox.  Firefox 2 doesn't have the security fix mentioned in comment 0; Firefox 3 has the fix.
Boris, how is it a security bug? If it really is, we should fix it.
Anne, you read bug 305873, right?  The one cited in comment 0?  Last I checked, Opera had that bug.  If you've changed the behavior since then, great.
We don't execute at unexpected EOF (afaict). We do however close <script> when it has both a trailing slash and a src="" attribute.
Well that's just weird and inconsistent, no other HTML element supports XML empty-element syntax as of yet.

But no, it's not really a security issue that I can think of. Unless there are sites that does content filtering using methods like

s/<script.*</script>?//
You need to log in before you can comment on or make changes to this bug.