Last Comment Bug 667853 - Missing close() event if websocket::init fails?
: Missing close() event if websocket::init fails?
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla8
Assigned To: Jason Duell [:jduell] (needinfo me)
: Patrick McManus [:mcmanus]
Depends on: 537787
  Show dependency treegraph
Reported: 2011-06-28 02:13 PDT by Jason Duell [:jduell] (needinfo me)
Modified: 2012-05-30 07:19 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

patch v1 (4.35 KB, patch)
2011-07-08 07:53 PDT, Patrick McManus [:mcmanus]
jduell.mcbugs: review-
Details | Diff | Splinter Review
patch v2 (12.12 KB, patch)
2011-07-29 14:52 PDT, Jason Duell [:jduell] (needinfo me)
mcmanus: review+
josh: feedback+
Details | Diff | Splinter Review

Description Jason Duell [:jduell] (needinfo me) 2011-06-28 02:13:59 PDT
This is based on just a code read-through--I might be wrong.

nsWebSocket::Init calls EstablishConnection(), but ignores the return value with a comment about WS constructor only throwing for bad URI/port.

But AFAICT if there's any other error, we need to fire a close() event asynchronously, and by ignoring EstablishConnection, we don't, for a very large set of errors (anything leading up to and including the WS channel's AsyncOpen, including errors from SetLoadGroup, hitting max # of WS connections, etc--it's a long list).

Fix is simple in theory?  If EstablishConnection fails, post an async event that calls OnStop.   But I'm not sure if the existing OnStop logic would be happy being called with a state of never having had an open cxn, or if any of the logic in StopSession is also needed.
Comment 1 Jason Duell [:jduell] (needinfo me) 2011-06-30 00:54:56 PDT
I've verified that this is indeed a bug.  I just went in and changed nsWebSocketHandler::AsyncOpen to immediately return NS_ERROR_UNEXPECTED.  With that, content/base/test/test_websocket_hello.html hangs (never gets a close).
Comment 2 Jason Duell [:jduell] (needinfo me) 2011-07-01 21:39:50 PDT
The e10s version of the code also has this bug, and needs to be fixed in WebSocketChannelChild::AsyncOpenFailed().

Note that as near as I can tell, we might also achieve this by scrapping the
entire AsyncOpenFailed message, and just having the parent's CancelEarly() call SendClose.  I haven't looked carefully enough to know if that would cover everything that needs to happen.

We should rename CancelEarly, BTW, as we did in HTTP--it has nothing to do with cancellation (this was more confusing with HTTP, which has a Cancel operation, but still).  FailedAsyncOpen is what we went with there.
Comment 3 Jason Duell [:jduell] (needinfo me) 2011-07-01 21:42:52 PDT
Sorry, to be more precise:  the EstablishConnection fix will be in nsWebsocket, so will be in both parent/child. But under e10s we still get the same symptoms if the parent's AsyncOpen fails for any reason, because we're not firing off a close() event in AsyncOpenFailed.

So, 2 bugs with same symptom.  might as well fix both here.
Comment 4 Patrick McManus [:mcmanus] 2011-07-08 07:53:12 PDT
Created attachment 544816 [details] [diff] [review]
patch v1
Comment 5 Jason Duell [:jduell] (needinfo me) 2011-07-29 04:39:06 PDT
Comment on attachment 544816 [details] [diff] [review]
patch v1

Review of attachment 544816 [details] [diff] [review]:

::: content/base/src/nsWebSocket.cpp
@@ +774,1 @@

Seems logical to call FailConnection() instead of close() (which calls FailConnection if CONNECTING, which should always be true here, right?  But I see Close also does a kungFuDeathGrip, claiming that FailConnection can release nsWebSocket (but I don't see how, and we call it in other places w/o the deathgrip).  Confusing.  I guess it can stay this way for now.  Grumble...

::: netwerk/protocol/websocket/WebSocketChannel.cpp
@@ +1830,5 @@
> +  // use AbortSession() as a replacement for the normal synchronous
> +  // error handling of AsyncOpen()
> +  mListener = aListener;
> +  mContext = aContext;
> +

Alas, even this isn't early enough if you call mChannel->Close before calling AsyncOpen.

@@ +1988,5 @@
> +    LOG(("WebSocketChannel::Close() without transport - aborting."));
> +    AbortSession(NS_ERROR_NOT_CONNECTED);
> +  }
> +

I'm going to leave this in my version of the patch--doesn't seem to do any harm?  I'm not clear on whether it's reachable.  Let me know if it should come out.

::: netwerk/protocol/websocket/WebSocketChannelParent.cpp
@@ +83,5 @@
>        do_CreateInstance(";1?name=ws", &rv);
>    }
>    if (NS_FAILED(rv))
>      return CancelEarly();

recvCancelEarly in its current incarnation doesn't wind up sending a close event to the websocket, so problem still exists.

@@ +99,1 @@

These mChannel->Close calls (before AsyncOpen) don't work either.  mChannel->mListener hasn't been set yet, so StopSession() doesn't dispatch OnStop() to the parent (and thus the child doesn't get it either).
Comment 6 Jason Duell [:jduell] (needinfo me) 2011-07-29 04:53:03 PDT
Comment on attachment 544816 [details] [diff] [review]
patch v1

Review of attachment 544816 [details] [diff] [review]:

I'm working on a patch that will fix this all correctly I think, so hold off on making any changes, Patrick.

::: content/base/src/nsWebSocket.cpp
@@ +774,1 @@

Note that the Close call actually does work in this case (parent process websockets): onclose gets sent.  It's the e10s parts of the patch that don't.  I tested using test_websockets_basic.html, sticking in bogus unconditional errors returns.
Comment 7 Jason Duell [:jduell] (needinfo me) 2011-07-29 14:52:30 PDT
Created attachment 549481 [details] [diff] [review]
patch v2

Asking Patrick and jdm to both look over this: jdm just for the e10s bits, which Patrick is less familiar with IIRC.
Comment 8 Wellington Fernando de Macedo 2011-07-31 10:24:30 PDT
> right?  But I see Close also does a kungFuDeathGrip, claiming that 
> FailConnection can release nsWebSocket (but I don't see how, and we call it in 
> other places w/o the deathgrip).  Confusing.  I guess it can stay this way for 
> now.  Grumble...
Until the v.76 there were locks because the code had more than one thread running. So maybe it needs to be checked. I will fix the bug 664894, and I think that my work that will address that thing too.
Comment 9 Patrick McManus [:mcmanus] 2011-08-02 11:55:33 PDT
Comment on attachment 549481 [details] [diff] [review]
patch v2

check it in!
Comment 10 Josh Matthews [:jdm] (on vacation until Dec 5) 2011-08-02 13:27:22 PDT
Comment on attachment 549481 [details] [diff] [review]
patch v2

After some pondering, I approve.
Comment 11 Jason Duell [:jduell] (needinfo me) 2011-08-02 17:05:21 PDT
Comment 12 Marco Bonardo [::mak] 2011-08-03 02:23:09 PDT

Note You need to log in before you can comment on or make changes to this bug.