Closed Bug 858538 Opened 11 years ago Closed 11 years ago

download-attribute kills WebSocket-connections

Categories

(Core :: DOM: Navigation, defect)

20 Branch
x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: info, Assigned: evilpie)

References

Details

Attachments

(1 file)

(1) Add a link with a download-attribute
(2) Open a Websocket connection
(3) Click on the link with the download-attribute
(4) The WebSocket connection closes at once (well to see in Firebug or in the web console)

Here the minimal working example:

http://paste.dennis-boldt.de/2013/mozilla/download-attribute-kills-websocket-connection.html

Here a screenshot:

http://paste.dennis-boldt.de/2013/mozilla/download-attribute-kills-websocket-connection.png
I can reproduce this without a download attribute too.

It seems that clicking any link that results in a download will close an active WebSocket connection on a page. Saving a link with "Save link as.." does work though, and the connection stays open.

The console shows "The connection to ws://localhost/ was interrupted while the page was loading" right before "onclose" gets triggered.

Also to note, Chrome doesn't show this behavior and the socket stays open during downloads.
After some investigation, it seems WebSocket::Cancel gets wrongly called when a download is started. "Save link as.." seems to skip this call completely. It doesn't look like the cancel call is tied to the download popup as it still cancels for filetypes that are automatically saved.

Still investigating where it's getting called...
I have nothing to add but encouragement at this point. Leave a note if your investigations get stuck and I'll try to help; thanks for taking the time to track it down.
I think i've found the root cause:

Execution of a link click starts at nsDocShell::OnLinkClickSync and end up in nsDocLoader::Stop through a series of calls, which then calls WebSocket::Cancel(). While this is happening, there's an asynchronous call to necko which tries to fetch the resource of the link. Problem is, the WebSocket is already closed long before it is decided if the request is a download or an actual new document (In which case, the socket should be closed).

At this point, i think it's best to wait for the request to finish, and then decide if the WebSocket shall be closed. I'll try to figure something out.
While trying a few things in my app I think I may have found the cause for the ghost WebSockets as described in Bug 765738. The underlying cause might be the same as this bug.

Here's what I see on the client, when I create a WS after the DOM has loaded (using jQuery's $(document).ready):

[02:53:14.182] "onopen"
[02:53:14.303] "onmessage"

Now, if I don't wait for the document to fully load, I see this:

[02:40:46.982] The connection to ws://localhost/ was interrupted while the page was loading.
[02:40:46.995] "onclose"
[02:40:47.282] "onopen"
[02:40:47.497] "onmessage"

Mind, that there's only one single call to WebSocket() in JavaScript, but it seems to mysteriously create another connection here (with no onopen called). Only one of these connection actually reach the server. I think what happens is, that the initial WS gets interrupted by a resource request from the same document which is loading (as this resource request might end up in a code path leading to a WebSocket::Cancel()).

I haven't looked at the underlying Firefox code yet, but there must be something in place that explains this.
silverwind--thanks for looking into this.  Really great work!

> Execution of a link click starts at nsDocShell::OnLinkClickSync and end up in 
> nsDocLoader::Stop through a series of calls, which then calls WebSocket::Cancel() 
> ...Problem is, the WebSocket is already closed long before it is decided 
> if the request is a download or an actual new document 

Yes, this sounds very plausible as the cause for this.  bz, do you have an idea of how we can fix docshell to not cancel websockets until we know if a link click might be a download and not a page navigation?  (IIUC we really shouldn't cancel /anything/ in the doc's loadGroup until we know this, right?  Sounds like we'd currently cancel still-loading images, etc when a user clicks a download link).

> I may have found the cause for the ghost WebSockets as described in Bug 765738... 
> if I don't wait for the document to fully load, I see... another connection...

Hmm, so is there a chance you already had a websocket open in the previous document, and we're seeing the close of that?   I know that we're reusing the same docshell when we navigate, and websockets is getting bitten by there not being a clear delineator between when the old->new page transition is complete.  Perhaps bz has some insight here...
Flags: needinfo?(bzbarsky)
Component: Networking: WebSockets → Document Navigation
> do you have an idea of how we can fix docshell to not cancel websockets until we know if
> a link click might be a download and not a page navigation? 

For @download we may be able to.  For cases when the server sends content-disposition:attachment, we can't (and doing it would violate the spec and web compat anyway; websites depend on the page being stopped when we send the request to the server).

The right solution for that would be determining whether we're doing a download due to @download earlier in the loading process and then skipping the Stop() call in that case.

> there not being a clear delineator between when the old->new page transition is complete

Well, the docshell and outer window sure know when it's "complete" in the sense of the new document and inner window becoming current.  If they need to tell websockets about that, we should just make them do it.
Flags: needinfo?(bzbarsky)
> > I may have found the cause for the ghost WebSockets as described in Bug 765738... 
> > if I don't wait for the document to fully load, I see... another connection...
> 
> Hmm, so is there a chance you already had a websocket open in the previous
> document, and we're seeing the close of that?   I know that we're reusing
> the same docshell when we navigate, and websockets is getting bitten by
> there not being a clear delineator between when the old->new page transition
> is complete.  Perhaps bz has some insight here...

I made sure to first navigate to a regular, non-socket page, before loading. What really puzzles me here is, that "onclose" fires before "onopen", which implies that that there are either two connection attempts, or a a strange race condition. I'll see if I can provide you with a test case in a new bug.

BTW, I think I'll leave the fixing of this bug up to you guys. Diving into this codebase for the first time was quite overwhelming, and it looks like the change that's required here could possibly break a lot of things if done wrong.
> I made sure to first navigate to a regular, non-socket page, before loading.
> What really puzzles me here is, that "onclose" fires before "onopen", which
> implies that that there are either two connection attempts, or a a strange
> race condition. I'll see if I can provide you with a test case in a new bug.

Looks like I can't reproduce this any more now, so disregard anything said about that specific case. What I've been seeing was probably the interrupt error when leaving the page.
> The right solution... would be determining whether we're doing a download due to
>  @download earlier in the loading process and then skipping the Stop() call in 
> that case.

OK, so this is the actionable item.  But I don't know who can take this--I don't know docshell at all.  I might be able to take it with some guidance.
I can certainly offer guidance.  ;)

Tom, do you want to take a look at this?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(evilpies)
I will certainly look at, but I don't really have much of a clue about most of the networking stack to be honest.
Flags: needinfo?(evilpies)
Tom: this bug shouldn't touch the network stack code at all.  We just need to not issue Cancel() on the docshell's nsILoadGroup if a clicked link has a download attribute.
More precisely, we need to not make the Stop() calls in nsDocShell::InternalLoad in that case.
Assignee: nobody → evilpies
I think we probably just need to duplicate the bIsJavaScript check in InternalLoad.
Attached patch v1Splinter Review
This fixes the problem. Verified with bz's test page: http://web.mit.edu/bzbarsky/www/test2.html.
Attachment #743850 - Flags: review?(bzbarsky)
Note that that test page will likely go away.  It looks like this:

<a download="something" href="test.txt">
  Click me to download; that should not preclude the iframe below from loading after 10
  seconds
</a>
<hr>
<iframe src="http://software.hixie.ch/utilities/cgi/test-tools/delayed-file?pause=10&mime=text%2Fplain&text=abcde"></iframe>

with the <a> pointing to a file on the same server.
Comment on attachment 743850 [details] [diff] [review]
v1

r=me
Attachment #743850 - Flags: review?(bzbarsky) → review+
https://hg.mozilla.org/mozilla-central/rev/1ba1027a4d54
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
While I can confirm that <a download> doesn't close the websocket anymore using the tinderbox build, a normal <a> still does. I'm afraid there might have been a misunderstanding of the core issue here:

A normal <a> click shouldn't close a websocket connection before it is known if the click results in a new document being loaded. Right now, WebSocket::Cancel() gets called before Necko has decided if the request is a download or a new page. See Comment #4 for my interpretation.

Do you need a new bug or can this one be reopenend?
That's a separate issue, actually; it's not specific to <a> at all, and is actually extremely hard to fix: it requires making websocket connections behave differently from every single other network connection in the browser...  Please do file it.  The basic difference is that per spec, unlike instances of fetch which are killed while aborting a document websockets are supposed to only go away during the unloading document cleanup steps.  This also applies to EventSource, actually.
Thanks. I wanted to provide some specific test cases for things that shouldn't interrupt a websocket, but never got around to it.
sorry to cross post, but the issue should not be closed as resolved, this is still happening; as per https://bugzilla.mozilla.org/show_bug.cgi?id=896666 :

I've set up some code to reproduce the issue : https://github.com/anthonydahanne/firefox-ws-and-download
And, more importantly, a live demo of the issue : http://peaceful-sierra-45424.herokuapp.com/
Anthony, this issue was specifically about <a> with a "download" attribute specified.  Your demo doesn't use a "download" attribute, so isn't covered by this issue.

As you noted, bug 896666 is the issue that tracks cases in which we don't know we're doing a download until we're far enough into the navigation algorithm that we've aborted the document, and the fact that per spec aborting the document shouldn't abort websocket connections.
ah right, that makes sense Boris !
I'll follow progress on bug 896666 then ! thank you for the clarification !
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: