The default bug view has changed. See this FAQ.

download-attribute kills WebSocket-connections

RESOLVED FIXED in mozilla23

Status

()

Core
Document Navigation
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: Dennis Boldt, Assigned: evilpie)

Tracking

20 Branch
mozilla23
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
(1) Add a link with a download-attribute
(2) Open a Websocket connection
(3) Click on the link with the download-attribute
(4) The WebSocket connection closes at once (well to see in Firebug or in the web console)

Here the minimal working example:

http://paste.dennis-boldt.de/2013/mozilla/download-attribute-kills-websocket-connection.html

Here a screenshot:

http://paste.dennis-boldt.de/2013/mozilla/download-attribute-kills-websocket-connection.png

Comment 1

4 years ago
I can reproduce this without a download attribute too.

It seems that clicking any link that results in a download will close an active WebSocket connection on a page. Saving a link with "Save link as.." does work though, and the connection stays open.

The console shows "The connection to ws://localhost/ was interrupted while the page was loading" right before "onclose" gets triggered.

Also to note, Chrome doesn't show this behavior and the socket stays open during downloads.

Comment 2

4 years ago
After some investigation, it seems WebSocket::Cancel gets wrongly called when a download is started. "Save link as.." seems to skip this call completely. It doesn't look like the cancel call is tied to the download popup as it still cancels for filetypes that are automatically saved.

Still investigating where it's getting called...

Comment 3

4 years ago
I have nothing to add but encouragement at this point. Leave a note if your investigations get stuck and I'll try to help; thanks for taking the time to track it down.

Comment 4

4 years ago
I think i've found the root cause:

Execution of a link click starts at nsDocShell::OnLinkClickSync and end up in nsDocLoader::Stop through a series of calls, which then calls WebSocket::Cancel(). While this is happening, there's an asynchronous call to necko which tries to fetch the resource of the link. Problem is, the WebSocket is already closed long before it is decided if the request is a download or an actual new document (In which case, the socket should be closed).

At this point, i think it's best to wait for the request to finish, and then decide if the WebSocket shall be closed. I'll try to figure something out.

Comment 5

4 years ago
While trying a few things in my app I think I may have found the cause for the ghost WebSockets as described in Bug 765738. The underlying cause might be the same as this bug.

Here's what I see on the client, when I create a WS after the DOM has loaded (using jQuery's $(document).ready):

[02:53:14.182] "onopen"
[02:53:14.303] "onmessage"

Now, if I don't wait for the document to fully load, I see this:

[02:40:46.982] The connection to ws://localhost/ was interrupted while the page was loading.
[02:40:46.995] "onclose"
[02:40:47.282] "onopen"
[02:40:47.497] "onmessage"

Mind, that there's only one single call to WebSocket() in JavaScript, but it seems to mysteriously create another connection here (with no onopen called). Only one of these connection actually reach the server. I think what happens is, that the initial WS gets interrupted by a resource request from the same document which is loading (as this resource request might end up in a code path leading to a WebSocket::Cancel()).

I haven't looked at the underlying Firefox code yet, but there must be something in place that explains this.
silverwind--thanks for looking into this.  Really great work!

> Execution of a link click starts at nsDocShell::OnLinkClickSync and end up in 
> nsDocLoader::Stop through a series of calls, which then calls WebSocket::Cancel() 
> ...Problem is, the WebSocket is already closed long before it is decided 
> if the request is a download or an actual new document 

Yes, this sounds very plausible as the cause for this.  bz, do you have an idea of how we can fix docshell to not cancel websockets until we know if a link click might be a download and not a page navigation?  (IIUC we really shouldn't cancel /anything/ in the doc's loadGroup until we know this, right?  Sounds like we'd currently cancel still-loading images, etc when a user clicks a download link).

> I may have found the cause for the ghost WebSockets as described in Bug 765738... 
> if I don't wait for the document to fully load, I see... another connection...

Hmm, so is there a chance you already had a websocket open in the previous document, and we're seeing the close of that?   I know that we're reusing the same docshell when we navigate, and websockets is getting bitten by there not being a clear delineator between when the old->new page transition is complete.  Perhaps bz has some insight here...
Flags: needinfo?(bzbarsky)
Component: Networking: WebSockets → Document Navigation
> do you have an idea of how we can fix docshell to not cancel websockets until we know if
> a link click might be a download and not a page navigation? 

For @download we may be able to.  For cases when the server sends content-disposition:attachment, we can't (and doing it would violate the spec and web compat anyway; websites depend on the page being stopped when we send the request to the server).

The right solution for that would be determining whether we're doing a download due to @download earlier in the loading process and then skipping the Stop() call in that case.

> there not being a clear delineator between when the old->new page transition is complete

Well, the docshell and outer window sure know when it's "complete" in the sense of the new document and inner window becoming current.  If they need to tell websockets about that, we should just make them do it.
Flags: needinfo?(bzbarsky)

Comment 8

4 years ago
> > I may have found the cause for the ghost WebSockets as described in Bug 765738... 
> > if I don't wait for the document to fully load, I see... another connection...
> 
> Hmm, so is there a chance you already had a websocket open in the previous
> document, and we're seeing the close of that?   I know that we're reusing
> the same docshell when we navigate, and websockets is getting bitten by
> there not being a clear delineator between when the old->new page transition
> is complete.  Perhaps bz has some insight here...

I made sure to first navigate to a regular, non-socket page, before loading. What really puzzles me here is, that "onclose" fires before "onopen", which implies that that there are either two connection attempts, or a a strange race condition. I'll see if I can provide you with a test case in a new bug.

BTW, I think I'll leave the fixing of this bug up to you guys. Diving into this codebase for the first time was quite overwhelming, and it looks like the change that's required here could possibly break a lot of things if done wrong.

Comment 9

4 years ago
> I made sure to first navigate to a regular, non-socket page, before loading.
> What really puzzles me here is, that "onclose" fires before "onopen", which
> implies that that there are either two connection attempts, or a a strange
> race condition. I'll see if I can provide you with a test case in a new bug.

Looks like I can't reproduce this any more now, so disregard anything said about that specific case. What I've been seeing was probably the interrupt error when leaving the page.
> The right solution... would be determining whether we're doing a download due to
>  @download earlier in the loading process and then skipping the Stop() call in 
> that case.

OK, so this is the actionable item.  But I don't know who can take this--I don't know docshell at all.  I might be able to take it with some guidance.
I can certainly offer guidance.  ;)

Tom, do you want to take a look at this?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(evilpies)
(Assignee)

Comment 12

4 years ago
I will certainly look at, but I don't really have much of a clue about most of the networking stack to be honest.
Flags: needinfo?(evilpies)
Tom: this bug shouldn't touch the network stack code at all.  We just need to not issue Cancel() on the docshell's nsILoadGroup if a clicked link has a download attribute.
More precisely, we need to not make the Stop() calls in nsDocShell::InternalLoad in that case.
Assignee: nobody → evilpies
(Assignee)

Comment 15

4 years ago
I think we probably just need to duplicate the bIsJavaScript check in InternalLoad.
(Assignee)

Comment 16

4 years ago
Created attachment 743850 [details] [diff] [review]
v1

This fixes the problem. Verified with bz's test page: http://web.mit.edu/bzbarsky/www/test2.html.
Attachment #743850 - Flags: review?(bzbarsky)
Note that that test page will likely go away.  It looks like this:

<a download="something" href="test.txt">
  Click me to download; that should not preclude the iframe below from loading after 10
  seconds
</a>
<hr>
<iframe src="http://software.hixie.ch/utilities/cgi/test-tools/delayed-file?pause=10&mime=text%2Fplain&text=abcde"></iframe>

with the <a> pointing to a file on the same server.
Comment on attachment 743850 [details] [diff] [review]
v1

r=me
Attachment #743850 - Flags: review?(bzbarsky) → review+
https://hg.mozilla.org/mozilla-central/rev/1ba1027a4d54
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23

Comment 20

4 years ago
While I can confirm that <a download> doesn't close the websocket anymore using the tinderbox build, a normal <a> still does. I'm afraid there might have been a misunderstanding of the core issue here:

A normal <a> click shouldn't close a websocket connection before it is known if the click results in a new document being loaded. Right now, WebSocket::Cancel() gets called before Necko has decided if the request is a download or a new page. See Comment #4 for my interpretation.

Do you need a new bug or can this one be reopenend?
That's a separate issue, actually; it's not specific to <a> at all, and is actually extremely hard to fix: it requires making websocket connections behave differently from every single other network connection in the browser...  Please do file it.  The basic difference is that per spec, unlike instances of fetch which are killed while aborting a document websockets are supposed to only go away during the unloading document cleanup steps.  This also applies to EventSource, actually.
Duplicate of this bug: 880200
I filed bug 896666 on comment 20 and comment 21.

Comment 24

4 years ago
Thanks. I wanted to provide some specific test cases for things that shouldn't interrupt a websocket, but never got around to it.
You need to log in before you can comment on or make changes to this bug.