Last Comment Bug 675784 - WebSockets - problem cancelling pre-bootstrapped connection
: WebSockets - problem cancelling pre-bootstrapped connection
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: Trunk
: x86_64 Linux
: -- normal (vote)
: mozilla8
Assigned To: Patrick McManus [:mcmanus] PTO until Sep 6
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-08-01 15:37 PDT by Patrick McManus [:mcmanus] PTO until Sep 6
Modified: 2011-08-08 23:34 PDT (History)
4 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch 1 (11.47 KB, patch)
2011-08-02 11:49 PDT, Patrick McManus [:mcmanus] PTO until Sep 6
jduell.mcbugs: review+
Details | Diff | Splinter Review
Some fixups to original patch (as additional patch, not replacement). (8.32 KB, patch)
2011-08-05 20:00 PDT, Jason Duell [:jduell] (needinfo me)
mcmanus: review+
Details | Diff | Splinter Review

Description Patrick McManus [:mcmanus] PTO until Sep 6 2011-08-01 15:37:09 PDT
A canceled request can hang if it is canceled prior to the websockets bootstrap handshake being issued.
Comment 1 Patrick McManus [:mcmanus] PTO until Sep 6 2011-08-02 11:49:30 PDT
Created attachment 550143 [details] [diff] [review]
patch 1

if you isolate test 2 of test_websocket.html and click reload a couple times while the test is outstanding (the test involves a multiple second connection timeout - reload during that) you can see the issue.
Comment 2 Jason Duell [:jduell] (needinfo me) 2011-08-05 19:59:39 PDT
Comment on attachment 550143 [details] [diff] [review]
patch 1

Review of attachment 550143 [details] [diff] [review]:
-----------------------------------------------------------------

Took me a while to grok it all, but I think it's good.

::: netwerk/protocol/websocket/WebSocketChannel.cpp
@@ +379,5 @@
> +                          "transaction already running");
> +
> +        (mData[index])->mChannel->mOpenBlocked = 0;
> +        (mData[index])->mChannel->mOpenRunning = 1;
> +        (mData[index])->mChannel->BeginOpen();

So if the page is in the middle of being Canceled by the LoadGroup, and the LG cancels this cxn but there's another WS in same page that's mOpenBlocked, it's going to get found by this logic and have BeginOpen->AsyncOpen called, just to then get Cancelled by the LG.  Right?  I guess that's fine--a little messy, but correct.

There's a big block of shared logic here with Complete--move to a new function called "ConnectNext" or whatever (and please declare a channel * variable so you don't need to repeat mData[index]>mChannel 5 times).

::: netwerk/protocol/websocket/WebSocketChannel.h
@@ +240,5 @@
>    PRUint32                        mAutoFollowRedirects       : 1;
>    PRUint32                        mReleaseOnTransmit         : 1;
>    PRUint32                        mTCPClosed                 : 1;
> +  PRUint32                        mOpenBlocked               : 1;
> +  PRUint32                        mOpenStarted               : 1;

Unlike mOpenBlocked and mOpenRunning, this one isn't temporary--it stays set once the http chan's asyncOpen has been called.  For parity with nsHttpChannel, let's call it mChannelWasOpened, unless you object.
Comment 3 Jason Duell [:jduell] (needinfo me) 2011-08-05 20:00:53 PDT
Created attachment 551227 [details] [diff] [review]
Some fixups to original patch (as additional patch, not replacement).

Patrick, let's include these as well unless you object.
Comment 4 Patrick McManus [:mcmanus] PTO until Sep 6 2011-08-07 06:22:20 PDT
> So if the page is in the middle of being Canceled by the LoadGroup, and the
> LG cancels this cxn but there's another WS in same page that's mOpenBlocked,
> it's going to get found by this logic and have BeginOpen->AsyncOpen called,
> just to then get Cancelled by the LG.  Right?  I guess that's fine--a little
> messy, but correct.
> 

right.. the basic issue is that the websocket protocol defines a period of time (a potentially long async period of time) between calling websocketchannel->asyncopen() and when the http connection it uses is opened. The spec requires a dns lookup and potentially a queued wait before the http stuff can begin. If a cancel comes in during that time, we don't see it.

I went with this approach. The other approach is to implement nsirequest just to get the cancel, but that's a whole slew of code that is needless/dup'd-with-http to solve a very small problem even if it is a little cleaner in terms of the model. (i.e. it fits the model well, but it is annoying in practice).
Comment 5 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-08 05:46:20 PDT
http://hg.mozilla.org/mozilla-central/rev/1586f2d51fac
http://hg.mozilla.org/mozilla-central/rev/d28a4c15096c
Comment 6 Jason Duell [:jduell] (needinfo me) 2011-08-08 23:34:14 PDT
> The other approach is to implement nsirequest just to get the cancel, 
> but that's a whole slew of code that is needless/dup'd-with-http

Right.  But with the cleanup to eliminate WebsocketEstablishedConnection, it probably will make sense to more the nsIRequest code it implements into WebsocketChannel rather than nsWebSocket.  (I haven't looked at Wellington's patch to see if he's doing that already or not).

Note You need to log in before you can comment on or make changes to this bug.