User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:18.104.22.168) Gecko/2009101601 Firefox/3.0.15 (.NET CLR 3.5.30729) Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:22.214.171.124) Gecko/2009101601 Firefox/3.0.15 (.NET CLR 3.5.30729) Firefox makes another request to the page when encountering: <img src=""> <script src=""> <link href=""> For example, if the page is hosted at http://www.example.com/dir/mypage.htm, each of these cause another request to be made to http://www.example.com/dir/mypage.htm. This seems like a serious bug that has the potential to do a lot of damage to servers. Example page: <html> <head> <title>Bad Image</title> </head> <body> <img src=""> </body> </html> Reproducible: Always Steps to Reproduce: 1. Create a page with one of the offending patterns. 2. Load the page while watching HTTP traffic. Actual Results: Observe that there are multiple requests to the same page. Expected Results: There should be only one request to the specified page. Opera doesn't fire a request under any of these circumstances and Internet Explorer only fires a request for <img src="">, but not for the others. Chrome and Safari behave the same as Firefox.
Generally speaking, this is the right behavior. We've special-cased <img src=""> to not do this because it's sadly somewhat common on the web, but we've had no reports about this behavior with the other elements over the years, so would prefer to actually follow the standards on those.
Glad to hear that <img src=""> is has been fixed, I guess my Firefox is behind a bit. I'd like to better understand the reasoning behind leaving the others unfixed. If you now have <img src=""> and <iframe src=""> behaving the same way (not sending a request), why not add <link href=""> and <script src=""> to that list and avoid any future problems? I understand that resolving "" to the containing page is correct per relative URL resolution rules, but I can't understand why anyone could consider <link href=""> or <script src=""> as valid and intentional markup. There is no way that either of these cases can possibly work, and if people really want to do this (I can't imagine why), then they can always do <script src="?">. My point is, the cost of fixing these remaining holes is small versus the cost of doubling the page views of a particular site due to leaving them unfixed. Right now, IE only has an issue with <img src=""> while Opera doesn't have an issue with any of these cases. The folks at WebKit are also currently looking at fixing all three situations. Can I convince you to at least consider doing the same?
> Glad to hear that <img src=""> is has been fixed More precisely, it was purposefully broken. > I guess my Firefox is behind a bit. Your Firefox is based on Gecko 1.9.0.x, which means it's code from early summer 2008, plus security fixes. > resolving "" to the containing page is correct per relative URL resolution > rules Right. > why anyone could consider <link href=""> or <script src=""> as valid and > intentional markup The browser shouldn't be in the business of second-guessing the author, fundamentally. > There is no way that either of these cases can possibly work Sure there is. HTTP content negotiation. People do use it. > they can always do <script src="?">. This isn't actually equivalent (e.g. treated differently by proxies and caches). > the cost of fixing these remaining holes is small It's easy to change the behavior. The question is whether there is enough reason to purposely violate the URI specs here. For <img> there seemed to be. For these other cases, there doesn't seem to be. Let me turn this around. You're asking us to on purpose not follow a very clear spec and to second-guess authors. Why do you think that's a good idea in this case and not in all cases? What's special about this case? Is there a significant pattern of misuse that would indicate that such second-guessing and spec violation is warranted? This really isn't a decision to make lightly...
Pretty sure there's more info on this in bug #444931. I think that's where we changed it. (Included in Firefox 3.5, fwiw.)
Adjusting summary since we have in fact not loaded <img src=""> for quite a while now. (How have you not updated by now, tssk tssk ;) ) Regarding <script src="">, I don't understand why you would ever put in your markup. Inevitably that will result in a failed script load, and then having the script elements "already started" flag set (see the HTML5 spec) and so can never be used for anything useful. Both <img> and <link> I can better understand since a) It's more likely to be dynamically generated b) You can dynamically change the url later in order to display an image or change styling.
> I'm not sure that you can say <img src=""> and <link href=""> are more likely Sure I can. I have empirical evidence that <img src=""> happens a lot more than the other two. We've had a number of bug reports about <img> over the years, and none whatsoever about the others until this one, to my knowledge. > If we can agree that an occurrence of <link href=""> or <script src=""> is most > likely an error That's a really slippery slope. We don't want to go there unless it's a very common error that's adversely affecting our users. > The potential downside to changing this behavior Is that then we're explicitly violating the URI spec for no particularly good reason. This means people have less trust in us following specs in general, and adds to the general mental overhead to programming for the web. > That seems like a high price to pay for a coding mistake. There are plenty of higher prices for simpler coding mistakes in HTML/CSS/JS. The extra GET is really not a particularly serious problem. More importantly, the only bad effect is for the person who's made the mistake, whereas "correcting" it would have potential bad effects for people who are not in fact making any mistakes. If there were significant numbers of the people making the mistake and few of the latter, it might be worth it (as it was for <img>). But as it is, both numbers are pretty close to 0 as far as I can see.
I'm quite surprised to see you say that an extra GET request isn't a problem. The <img src=""> issue was fixed, which seems to be an admission that an extra GET request caused by this pattern is a problem. I'd also like to give you some easy numbers to show the impact. www.yahoo.com has 300 million page views per day, which clearly requires a lot of machines. If that number were to double, is there any doubt that would lead to capacity issues? That's exactly what happens with all of these patterns, so to me and anyone running high-volume sites, they are all serious issues. Even in well-run organizations, if an ad is served onto the page, there are little to no controls over that content. In regards to <script>, I was responding to you saying that you could understand how the issue could occur for <img> and <link>, but not <script>. My point here is that <link> and <script> are often setup in the same way on server-side systems, so a problem for one is often a problem for the other. I know you've received reports of <img src=""> being a problem before, and I'm the first to bring up the problem for these others tags, but there was a first person to submit a bug about <img src=""> at some point in time. Was his concern any less valid? I also understand that per the URL spec, relative resolution for "" is the behavior you are following. However, I don't understand why there is a distinction between <img src="">/<iframe src=""> (neither of which now cause an extra request) vs. <link href="">/<script src="">. You have said that not following the URL spec would cause confusion, but it seems to me that 1) you're already not following it in all cases and 2) if confusion is your main concern, then consistency would be favored over exceptions. I've also thought, perhaps incorrectly, that the definitive source of truth for how HTML should work is the HTML spec itself, which only references the URL spec. Since the HTML5 spec now contains language saying that "" is invalid as a src for <img>, doesn't that mean that HTML5 is in violation of the URL spec? I honestly cannot imagine why anyone would intentionally include <link href=""> and <script src=""> in their code anymore than I could imagine them intentionally including <img src="">. However, I am going to start asking around because if I'm wrong, I want to know. I don't want to debate this forever, I just want to make sure you folks understand that the destructive potential for leaving this behavior as-is is profound. If what I've said thusfar hasn't persuaded you, then so be it.
I think the use cases for <script src=""> are very weak, that is not my main concern. My main concerns are adding additional code, as well as not following specs. Given that adding special handling for <script src=""> takes very little code I'd be fine from that perspective. Even if it doesn't help *that* many sites. The second point is a bigger issue though. What firefox currently does *is* what specs say to do. I'd say the first action should be to raise this with the WHATWG or HTMLWG to get the HTML5 spec to spec the behavior you want. That way we can see if everyone agrees its a good idea, as well as nail down the specifics of how it should behave.
> seems to be an admission that an extra GET request caused by this pattern is a > problem It _can_ be a problem for users if it's common enough. If it's very rare, it's not a problem for users. > is there any doubt that would lead to capacity issues That seems like a problem for _yahoo_, caused by their own mistake, but not for our users nor us, no? I really do understand yahoo's position here, but I don't think that a single site running into this problem is strong enough to add more URI-handling exceptions. > Even in well-run organizations, if an ad is served onto the page, there are > little to no controls over that content. If you're allowing ad content as part of your page itself, not an iframe from the ad origin server, and have no control over the ad content, then you have much bigger issues (security ones) than this, no? > However, I don't understand why there is a distinction between > <img src="">/<iframe src=""> (neither of which now cause an > extra request) For <iframe> this was hit by a _lot_ of sites and the failure mode is an infinite recursion that eats all the user's memory. <iframe> doesn't allow such recursion even if src="" is nonempty, as long as it resolves to anything that's the URI of an ancestor frame For <img> you should just read the relevant bug; there's no point repeating the arguments here. The point is these are _exceptions_ where we purposefully violate a spec for very specific reasons. These reasons don't seem to be applicable in the <link> and <script> cases. > doesn't that mean that HTML5 is in violation of the URL spec? Technically, yes. There are a few other places where the URL spec is so out of touch with reality that it can't be implemented correctly in web browsers... Again, a goal is to minimize the number of such exceptions.
@Jonas - if you believe that's the way to go, I'll be happy to start up that conversation. @Boris - You're preaching to the choir on ads. It's a horrible system, but it's the way that ads work on the Internet. Everyone uses systems like this, not just Yahoo. This is still the most common way for bad code to make it onto an otherwise well-coded site. What's worse, it's incredibly difficult to detect what's happening. A correction: the extra GET request is never a problem for users, it's a problem for the servers. Users don't notice anything if an extra GET request is made. My point, once again, is that the potential problems for <link href=""> and <script src=""> are the same as for <img src=""> so to me, if you fix one, the others should be fixed too for consistency. That's what I'll be suggesting to WHAT-WG.
Changes were just made to HTML5 explicitly disallowing automatic requests for <link> and <script> when an empty-string URL is used. Can this be fixed in Firefox? :) http://html5.org/tools/web-apps-tracker?from=4833&to=4834
Putting it on the radar.
Taking this for now.
Phenomena I could observe with <script src=""> were; (A) bug 387078 comment #14. > (1) Initial load : One HTTP GET only, and 200 > (2) Reload : Two HTTP GET with If-Modified-Since, and two 304 Not-modified > (3) Shift+Reload : Three HTTP GET, and three 200 (B) Bug 506286 : Script error because HTML/XML source is interpreted as Script. Are we better to close above two bugs as dup of this bug? Was such problems already resolved by HTML5? If I enable HTML5, I don't see above problems?
Created attachment 442191 [details] [diff] [review] What I had sitting in my tree This is a *very* early start of a patch I wrote a while ago. Probably not worth using as a start, but I didn't want to just throw it away in case there was any goodness in there. Attaching here so I can nuke it from my tree.
I'm clearly not getting around to doing this :(
A reminder that a consequence of an incomplete img tag (observed in Firefox 3.6.8, 7.0.1, Seamonkey 2.7, palemoon7.0.1) where a forum such as SMF has produced: img src="http://" from an empty BBCode image tag: [img][/img] is that the page will not save to disk - with images. An empty _files folder is created with no files and the main html file is not saved. Save tooltip status indicates "File Not Found." Adblock sees it on the page as http:/// I thought the broken image ref save bug was fixed. Does anyone have the bug ID # for the Firefox broken image ref save bug or whatever it was called? Thanks! I will do some searching.
As and when this is fixed. Please don't forget to address empty url() or url('') as well as defined by CSS Values and Units Module Level 3 - 3.4.2 https://drafts.csswg.org/css-values-3/#url-empty Or src="#" which was also mentioned here for webkit. https://bugs.webkit.org/show_bug.cgi?id=30303#c30
<script> was handled in 1299373. Resummarizing this to focus on <link>. Happy to mentor someone who wants to do <link>. The relevant spec bit here is https://html.spec.whatwg.org/multipage/semantics.html#concept-link-obtain step 1. CSS url() should have a separate bug: it's a different part of the codebase and needs a quite different approach. Per spec, "#" is just an non-empty string and has no magic: it should load.
(In reply to Boris Zbarsky [:bz] (busy, pun intended) from comment #24) > CSS url() should have a separate bug: it's a different part of the codebase > and needs a quite different approach. Created bug 1302991 for it. Sebastian