Last Comment Bug 531327 - <script>/<link> tags with src=""/href="" cause request to server
: <script>/<link> tags with src=""/href="" cause request to server
Status: NEW
[evang-wanted]
: html5
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: unspecified
: All All
: -- normal with 4 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
: 489115 506286 (view as bug list)
Depends on:
Blocks: 374023 387078 506286
  Show dependency treegraph
 
Reported: 2009-11-26 17:46 PST by Nicholas C. Zakas
Modified: 2016-07-13 08:01 PDT (History)
27 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
What I had sitting in my tree (4.21 KB, patch)
2010-04-28 14:28 PDT, Jonas Sicking (:sicking) No longer reading bugmail consistently
no flags Details | Diff | Splinter Review

Description Nicholas C. Zakas 2009-11-26 17:46:50 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.15) Gecko/2009101601 Firefox/3.0.15 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.15) Gecko/2009101601 Firefox/3.0.15 (.NET CLR 3.5.30729)

Firefox makes another request to the page when encountering:
<img src="">
<script src="">
<link href="">

For example, if the page is hosted at http://www.example.com/dir/mypage.htm, each of these cause another request to be made to http://www.example.com/dir/mypage.htm. This seems like a serious bug that has the potential to do a lot of damage to servers.

Example page:

<html>
<head>
    <title>Bad Image</title>
</head>
<body>
<img src="">
</body>
</html>

Reproducible: Always

Steps to Reproduce:
1. Create a page with one of the offending patterns.
2. Load the page while watching HTTP traffic.
Actual Results:  
Observe that there are multiple requests to the same page.

Expected Results:  
There should be only one request to the specified page.

Opera doesn't fire a request under any of these circumstances and Internet Explorer only fires a request for <img src="">, but not for the others. Chrome and Safari behave the same as Firefox.
Comment 1 Tim (fmdeveloper) 2009-11-27 01:42:10 PST
This looks to be a duplicate of Bug 297071 (see comment 55), but this bug has a reduced test case.
Comment 2 Boris Zbarsky [:bz] 2009-11-30 23:02:33 PST
Generally speaking, this is the right behavior.  We've special-cased <img src=""> to not do this because it's sadly somewhat common on the web, but we've had no reports about this behavior with the other elements over the years, so would prefer to actually follow the standards on those.
Comment 3 Nicholas C. Zakas 2009-12-02 08:15:38 PST
Glad to hear that <img src=""> is has been fixed, I guess my Firefox is behind a bit.

I'd like to better understand the reasoning behind leaving the others unfixed. If you now have <img src=""> and <iframe src=""> behaving the same way (not sending a request), why not add <link href=""> and <script src=""> to that list and avoid any future problems?

I understand that resolving "" to the containing page is correct per relative URL resolution rules, but I can't understand why anyone could consider <link href=""> or <script src=""> as valid and intentional markup. There is no way that either of these cases can possibly work, and if people really want to do this (I can't imagine why), then they can always do <script src="?">. 

My point is, the cost of fixing these remaining holes is small versus the cost of doubling the page views of a particular site due to leaving them unfixed. Right now, IE only has an issue with <img src=""> while Opera doesn't have an issue with any of these cases. The folks at WebKit are also currently looking at fixing all three situations. Can I convince you to at least consider doing the same?
Comment 4 Boris Zbarsky [:bz] 2009-12-02 09:12:46 PST
> Glad to hear that <img src=""> is has been fixed

More precisely, it was purposefully broken.

> I guess my Firefox is behind a bit.

Your Firefox is based on Gecko 1.9.0.x, which means it's code from early summer 2008, plus security fixes.

> resolving "" to the containing page is correct per relative URL resolution 
> rules

Right.

> why anyone could consider <link href=""> or <script src=""> as valid and
> intentional markup

The browser shouldn't be in the business of second-guessing the author, fundamentally.

> There is no way that either of these cases can possibly work

Sure there is.  HTTP content negotiation.  People do use it.

> they can always do <script src="?">.

This isn't actually equivalent (e.g. treated differently by proxies and caches).

> the cost of fixing these remaining holes is small

It's easy to change the behavior.  The question is whether there is enough reason to purposely violate the URI specs here.  For <img> there seemed to be.  For these other cases, there doesn't seem to be.

Let me turn this around.  You're asking us to on purpose not follow a very clear spec and to second-guess authors.  Why do you think that's a good idea in this case and not in all cases?  What's special about this case?  Is there a significant pattern of misuse that would indicate that such second-guessing and spec violation is warranted?  This really isn't a decision to make lightly...
Comment 5 Christopher Blizzard (:blizzard) 2009-12-02 16:40:20 PST
Pretty sure there's more info on this in bug #444931.  I think that's where we changed it.  (Included in Firefox 3.5, fwiw.)
Comment 6 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-12-02 17:25:57 PST
Adjusting summary since we have in fact not loaded <img src=""> for quite a while now. (How have you not updated by now, tssk tssk ;) )

Regarding <script src="">, I don't understand why you would ever put in your markup. Inevitably that will result in a failed script load, and then having the script elements "already started" flag set (see the HTML5 spec) and so can never be used for anything useful.

Both <img> and <link> I can better understand since
a) It's more likely to be dynamically generated
b) You can dynamically change the url later in order to display an image or change styling.
Comment 7 Nicholas C. Zakas 2009-12-05 11:34:10 PST
I'm not sure that you can say <img src=""> and <link href=""> are more likely to be dynamically generated than <script src="">. We actually use the exact same system for CSS dependency management as we do for JavaScript dependency management, so the likelihood of this occurrence would be equivalent in our system. 

The larger issue, to me, is if <link href=""> and <script src=""> is a valid use case. The point that someone *might* do this doesn't logically mean that people *do* do this. If we can agree that an occurrence of <link href=""> or <script src=""> is most likely an error, then it seems that closing these holes would be a good thing. Based on other browsers' behaviors (read: IE and Opera), there seems to be at least moderate agreement that this is the case.

The potential downside to changing this behavior is that if people are actually relying on this functionality, their page would break. But then again, their pages would already be broken in IE and Opera, which I'm guessing would dissuade usage in general. That aside, supposing that someone is using it for a Firefox-only application, the fix is to include the URL rather than leaving it blank.

On the other hand, leaving this hole opened means that if someone does accidentally include one of these patterns on a page (and again, if we can agree that this will be the most likely reason for it), then page views on that page double. That seems like a high price to pay for a coding mistake.
Comment 8 Boris Zbarsky [:bz] 2009-12-05 13:41:48 PST
> I'm not sure that you can say <img src=""> and <link href=""> are more likely

Sure I can.  I have empirical evidence that <img src=""> happens a lot more than the other two.  We've had a number of bug reports about <img> over the years, and none whatsoever about the others until this one, to my knowledge.

> If we can agree that an occurrence of <link href=""> or <script src=""> is most
> likely an error

That's a really slippery slope.  We don't want to go there unless it's a very common error that's adversely affecting our users.

> The potential downside to changing this behavior

Is that then we're explicitly violating the URI spec for no particularly good reason.  This means people have less trust in us following specs in general, and adds to the general mental overhead to programming for the web.

> That seems like a high price to pay for a coding mistake.

There are plenty of higher prices for simpler coding mistakes in HTML/CSS/JS.  The extra GET is really not a particularly serious problem.  More importantly, the only bad effect is for the person who's made the mistake, whereas "correcting" it would have potential bad effects for people who are not in fact making any mistakes.  If there were significant numbers of the people making the mistake and few of the latter, it might be worth it (as it was for <img>).  But as it is, both numbers are pretty close to 0 as far as I can see.
Comment 9 Nicholas C. Zakas 2009-12-05 14:57:23 PST
I'm quite surprised to see you say that an extra GET request isn't a problem. The <img src=""> issue was fixed, which seems to be an admission that an extra GET request caused by this pattern is a problem. I'd also like to give you some easy numbers to show the impact. www.yahoo.com has 300 million page views per day, which clearly requires a lot of machines. If that number were to double, is there any doubt that would lead to capacity issues? That's exactly what happens with all of these patterns, so to me and anyone running high-volume sites, they are all serious issues. Even in well-run organizations, if an ad is served onto the page, there are little to no controls over that content.

In regards to <script>, I was responding to you saying that you could understand how the issue could occur for <img> and <link>, but not <script>. My point here is that <link> and <script> are often setup in the same way on server-side systems, so a problem for one is often a problem for the other. I know you've received reports of <img src=""> being a problem before, and I'm the first to bring up the problem for these others tags, but there was a first person to submit a bug about <img src=""> at some point in time. Was his concern any less valid?

I also understand that per the URL spec, relative resolution for "" is the behavior you are following. However, I don't understand why there is a distinction between <img src="">/<iframe src=""> (neither of which now cause an extra request) vs. <link href="">/<script src="">. You have said that not following the URL spec would cause confusion, but it seems to me that 1) you're already not following it in all cases and 2) if confusion is your main concern, then consistency would be favored over exceptions. 

I've also thought, perhaps incorrectly, that the definitive source of truth for how HTML should work is the HTML spec itself, which only references the URL spec. Since the HTML5 spec now contains language saying that "" is invalid as a src for <img>, doesn't that mean that HTML5 is in violation of the URL spec? 

I honestly cannot imagine why anyone would intentionally include <link href=""> and <script src=""> in their code anymore than I could imagine them intentionally including <img src="">. However, I am going to start asking around because if I'm wrong, I want to know.

I don't want to debate this forever, I just want to make sure you folks understand that the destructive potential for leaving this behavior as-is is profound. If what I've said thusfar hasn't persuaded you, then so be it.
Comment 10 Jonas Sicking (:sicking) No longer reading bugmail consistently 2009-12-05 17:21:59 PST
I think the use cases for <script src=""> are very weak, that is not my main concern. My main concerns are adding additional code, as well as not following specs.

Given that adding special handling for <script src=""> takes very little code I'd be fine from that perspective. Even if it doesn't help *that* many sites.

The second point is a bigger issue though. What firefox currently does *is* what specs say to do. I'd say the first action should be to raise this with the WHATWG or HTMLWG to get the HTML5 spec to spec the behavior you want. That way we can see if everyone agrees its a good idea, as well as nail down the specifics of how it should behave.
Comment 11 Boris Zbarsky [:bz] 2009-12-06 04:44:37 PST
> seems to be an admission that an extra GET request caused by this pattern is a
> problem

It _can_ be a problem for users if it's common enough.  If it's very rare, it's not a problem for users.

> is there any doubt that would lead to capacity issues

That seems like a problem for _yahoo_, caused by their own mistake, but not for our users nor us, no?  I really do understand yahoo's position here, but I don't think that a single site running into this problem is strong enough to add more URI-handling exceptions.

> Even in well-run organizations, if an ad is served onto the page, there are
> little to no controls over that content.

If you're allowing ad content as part of your page itself, not an iframe from the ad origin server, and have no control over the ad content, then you have much bigger issues (security ones) than this, no?

> However, I don't understand why there is a distinction between
> <img src="">/<iframe src=""> (neither of which now cause an
> extra request)

For <iframe> this was hit by a _lot_ of sites and the failure mode is an infinite recursion that eats all the user's memory.  <iframe> doesn't allow such recursion even if src="" is nonempty, as long as it resolves to anything that's the URI of an ancestor frame

For <img> you should just read the relevant bug; there's no point repeating the arguments here.

The point is these are _exceptions_ where we purposefully violate a spec for very specific reasons.  These reasons don't seem to be applicable in the <link> and <script> cases.

> doesn't that mean that HTML5 is in violation of the URL spec? 

Technically, yes.  There are a few other places where the URL spec is so out of touch with reality that it can't be implemented correctly in web browsers...  Again, a goal is to minimize the number of such exceptions.
Comment 12 Nicholas C. Zakas 2009-12-06 10:06:27 PST
@Jonas - if you believe that's the way to go, I'll be happy to start up that conversation.

@Boris - You're preaching to the choir on ads. It's a horrible system, but it's the way that ads work on the Internet. Everyone uses systems like this, not just Yahoo. This is still the most common way for bad code to make it onto an otherwise well-coded site. What's worse, it's incredibly difficult to detect what's happening. 

A correction: the extra GET request is never a problem for users, it's a problem for the servers. Users don't notice anything if an extra GET request is made.

My point, once again, is that the potential problems for <link href=""> and <script src=""> are the same as for <img src=""> so to me, if you fix one, the others should be fixed too for consistency. That's what I'll be suggesting to WHAT-WG.
Comment 13 Nicholas C. Zakas 2010-03-09 17:22:54 PST
Changes were just made to HTML5 explicitly disallowing automatic requests for <link> and <script> when an empty-string URL is used. Can this be fixed in Firefox? :)

http://html5.org/tools/web-apps-tracker?from=4833&to=4834
Comment 14 Boris Zbarsky [:bz] 2010-03-09 20:27:35 PST
Putting it on the radar.
Comment 15 Jonas Sicking (:sicking) No longer reading bugmail consistently 2010-03-10 09:09:18 PST
Taking this for now.
Comment 16 Christian :Biesinger (don't email me, ping me on IRC) 2010-04-08 14:10:13 PDT
-> me
Comment 17 WADA 2010-04-09 07:57:23 PDT
Phenomena I could observe with <script src=""> were;
(A) bug 387078 comment #14.
> (1) Initial load : One HTTP GET only, and 200
> (2) Reload       : Two HTTP GET with If-Modified-Since, and two 304 Not-modified
> (3) Shift+Reload : Three HTTP GET, and three 200
(B) Bug 506286 : Script error because HTML/XML source is interpreted as Script.

Are we better to close above two bugs as dup of this bug?
Was such problems already resolved by HTML5? If I enable HTML5, I don't see above problems?
Comment 18 Jonas Sicking (:sicking) No longer reading bugmail consistently 2010-04-28 14:28:39 PDT
Created attachment 442191 [details] [diff] [review]
What I had sitting in my tree

This is a *very* early start of a patch I wrote a while ago. Probably not worth using as a start, but I didn't want to just throw it away in case there was any goodness in there. Attaching here so I can nuke it from my tree.
Comment 19 Henri Sivonen (:hsivonen) 2010-05-06 05:28:31 PDT
*** Bug 506286 has been marked as a duplicate of this bug. ***
Comment 20 Jonas Sicking (:sicking) No longer reading bugmail consistently 2010-12-02 15:11:18 PST
*** Bug 489115 has been marked as a duplicate of this bug. ***
Comment 21 Christian :Biesinger (don't email me, ping me on IRC) 2012-07-11 15:53:23 PDT
I'm clearly not getting around to doing this :(
Comment 22 sheri 2012-08-28 09:46:03 PDT
A reminder that a consequence of an incomplete img tag (observed in Firefox 3.6.8, 7.0.1, Seamonkey 2.7, palemoon7.0.1) where a forum such as SMF has produced: img src="http://" from an empty BBCode image tag: [img][/img] is that the page will not save to disk - with images. An empty _files folder is created with no files and the main html file is not saved. Save tooltip status indicates "File Not Found." Adblock sees it on the page as http:///

I thought the broken image ref save bug was fixed. 
Does anyone have the bug ID # for the Firefox broken image ref save bug or whatever it was called? Thanks! I will do some searching.

Note You need to log in before you can comment on or make changes to this bug.