Last Comment Bug 695636 - Update close steps to adhere to WS spec.
: Update close steps to adhere to WS spec.
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: WebSockets (show other bugs)
: unspecified
: x86_64 Linux
: -- normal (vote)
: mozilla16
Assigned To: Jason Duell [:jduell] (needinfo me)
:
Mentors:
Depends on: 664894
Blocks:
  Show dependency treegraph
 
Reported: 2011-10-19 02:28 PDT by Jason Duell [:jduell] (needinfo me)
Modified: 2012-06-11 12:34 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
hack to check that onerror is called correctly in test-3 (958 bytes, patch)
2012-05-22 22:33 PDT, Jason Duell [:jduell] (needinfo me)
bugs: feedback-
Details | Diff | Splinter Review
Test to check onerror/onclose aren't called during close() (3.09 KB, patch)
2012-05-22 22:35 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review+
Details | Diff | Splinter Review
Remove FailConnectionQuietly (3.69 KB, patch)
2012-05-22 22:42 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review+
Details | Diff | Splinter Review
Make sure onerror called in an event that's async to any JS calls (17.76 KB, patch)
2012-05-23 13:23 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review-
Details | Diff | Splinter Review
cleanup patch #1: remove mOnCloseCompleted (3.30 KB, patch)
2012-05-23 13:23 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review+
Details | Diff | Splinter Review
cleanup patch #2: prefer CLOSED to mDisconnect (5.86 KB, patch)
2012-05-23 13:27 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review-
Details | Diff | Splinter Review
Fix case where OnStart called but WS already closed (1.39 KB, patch)
2012-06-02 23:34 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review+
Details | Diff | Splinter Review
Async onerror patch, interdiff v1 -> v2 (7.70 KB, patch)
2012-06-02 23:42 PDT, Jason Duell [:jduell] (needinfo me)
no flags Details | Diff | Splinter Review
Async onerror patch, v2 (18.78 KB, patch)
2012-06-02 23:45 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review+
Details | Diff | Splinter Review
cleanup patch #2: prefer CLOSED to mDisconnect, v2 (7.95 KB, patch)
2012-06-03 00:31 PDT, Jason Duell [:jduell] (needinfo me)
no flags Details | Diff | Splinter Review
cleanup patch #2: prefer CLOSED to mDisconnect, v3 (8.07 KB, patch)
2012-06-03 00:51 PDT, Jason Duell [:jduell] (needinfo me)
bugs: review+
Details | Diff | Splinter Review
single, rolled-up patch that landed on m-c (29.29 KB, patch)
2012-06-05 10:18 PDT, Jason Duell [:jduell] (needinfo me)
jduell.mcbugs: review+
akeybl: approval‑mozilla‑aurora-
Details | Diff | Splinter Review

Description Jason Duell [:jduell] (needinfo me) 2011-10-19 02:28:58 PDT
We've got a number of little ways that our close logic does not follow the latest spec and/or needs cleanup.

In the case where JS calls close() before the WS is connected, we need to set state to CLOSING (now: set to CLOSED), and queue function that 
    - sets state to CLOSED
    - calls onerror if needed
    - calls onclose

- OnStop from the network needs to do similar logic, but we need to sure make it doesn't do it if the case above has already queued the onclose logic (i.e.  don't call onclose or onerror twice).  The code in bug 664894 corrected places where we needed to call onerror, but as a result we can now call onerror twice if JS calls close() before the connection is established AND the network winds up failing the connection independently.

- SetReadyState generally is setting state to CLOSED while we're still in the JS call context instead of setting to CLOSING and doing CLOSED in queued event.

- Garbage collection.  Currently we call Disconnect in CloseConnection right after calling SetReadyState(CLOSED), which will both null out mMustCheckKeepAlive and call Release() before we've actually dispatched the onclose event.  I suspect we want to delay that until the dispatch is complete.

- We have lots of places with check with if (mDisconnected) Disconnect, which doesn't make sense.  Looks like an artifact exposed by removing nsWSEstablishedCxn.

I'm going to do an overall audit of our close code to fix this and anything else that comes up.  But I'm planning to do so after I land an initial implementation of the binary APIs so my patch queue stops rotting.

We need to fix the web-facing parts of this bug before unprefixing.
Comment 1 Jason Duell [:jduell] (needinfo me) 2011-12-17 19:02:03 PST
I don't think we need this to unprefix, though it'd be nice to have.
Comment 2 Jason Duell [:jduell] (needinfo me) 2012-05-22 22:33:34 PDT
Created attachment 626328 [details] [diff] [review]
hack to check that onerror is called correctly in test-3

This is a hack (which is probably too easter-egg-y to check in) that fails the websocket early (in EstablishConnection) only for the magic URI in test_websocket.html, test-3.

It demonstrates that the current code winds up not calling onerror (it tried to, but it calls it within Init(), before JS has actually had a chance to set the onerror handler, which is lower down in the JS script).

Works fine with my later patches applied.

Not requesting review because I suspect we don't allow this sort of gross magic URI stuff in m-c.  Smaug--let me know if you actually think it's OK to land this sort of thing.
Comment 3 Jason Duell [:jduell] (needinfo me) 2012-05-22 22:35:23 PDT
Created attachment 626329 [details] [diff] [review]
Test to check onerror/onclose aren't called during close()

Tests that we dispatch onerror/onclose within a separate event.  Fails now, works with my patches applied.  Also checks that readyState == CLOSING (not CLOSED) immediately after calling close().
Comment 4 Jason Duell [:jduell] (needinfo me) 2012-05-22 22:42:45 PDT
Created attachment 626331 [details] [diff] [review]
Remove FailConnectionQuietly

Remove FailConnectionQuietly, which is badly named (it doesn't actually "fail the connection" in the spec sense: onerror isn't called), and is just a one-liner anyway.
Comment 5 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-23 02:11:04 PDT
Comment on attachment 626328 [details] [diff] [review]
hack to check that onerror is called correctly in test-3

Er, no thank you :)
Comment 6 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-23 02:12:22 PDT
(In reply to Jason Duell (:jduell) from comment #2)
> It demonstrates that the current code winds up not calling onerror (it tried
> to, but it calls it within Init(), before JS has actually had a chance to
> set the onerror handler, which is lower down in the JS script).
> 
> Works fine with my later patches applied.
So, what work and what is wrong with error handling?
Comment 7 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-23 02:15:47 PDT
Comment on attachment 626329 [details] [diff] [review]
Test to check onerror/onclose aren't called during close()

So, does this depend on the hack? I assume this does, so clearing review.
Comment 8 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-23 02:18:18 PDT
> - Garbage collection.  Currently we call Disconnect in CloseConnection right
> after calling SetReadyState(CLOSED), which will both null out
> mMustCheckKeepAlive and call Release() before we've actually dispatched the
> onclose event.  I suspect we want to delay that until the dispatch is
> complete.
Why?
There is nsRefPtr<nsWebSocket> kungfuDeathGrip = this; which keeps the object alive.
Comment 9 Jason Duell [:jduell] (needinfo me) 2012-05-23 13:11:06 PDT
Comment on attachment 626329 [details] [diff] [review]
Test to check onerror/onclose aren't called during close()

No, this mochitest change doesn't depend on the gross hack.
Comment 10 Jason Duell [:jduell] (needinfo me) 2012-05-23 13:14:47 PDT
> There is nsRefPtr<nsWebSocket> kungfuDeathGrip = this; which keeps the object alive.

kungfu goes out of scope before the onclose error is dispatched. That said, I haven't seen an actual problem with the logic, even when I applied my patch that has the necko channel release the nsWebSocket after calling OnStop (right now it keeps a reference to nsWebSocket until it's destroyed, which is no sooner than nsWebSocket releases it in Disconnect.

Anyway, my next patch changes things enough that you just look at that.
Comment 11 Jason Duell [:jduell] (needinfo me) 2012-05-23 13:23:01 PDT
Created attachment 626577 [details] [diff] [review]
Make sure onerror called in an event that's async to any JS calls

The meat of the bug fix.  

Makes onerror (and Disconnect) happen, like onclose, in a separate event if needed (it's not needed if trigger by OnStop, which is already async vis a vis JD calls).

Gets rid of most kungFuDeathGrips because we no longer call Disconnect until the event fires.
Comment 12 Jason Duell [:jduell] (needinfo me) 2012-05-23 13:23:58 PDT
Created attachment 626579 [details] [diff] [review]
cleanup patch #1: remove mOnCloseCompleted

mOnCloseCompleted is superfluous.  The only time we set it is after we've just set mReadyState to CLOSED (in CallOnClose2), so it's equivalent to mReadyState==CLOSED.
Comment 13 Jason Duell [:jduell] (needinfo me) 2012-05-23 13:27:08 PDT
Created attachment 626580 [details] [diff] [review]
cleanup patch #2: prefer CLOSED to mDisconnect

Prefer checking mReadyState == CLOSED to checking mDisconnected: I think it makes the code a lot more intuitive.

They're both set in same event (that fires onclose/onerror), so equivalent in all the cases here.

I'm not sure why we need to guard GetInterface?
Comment 14 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-28 08:13:33 PDT
Comment on attachment 626577 [details] [diff] [review]
Make sure onerror called in an event that's async to any JS calls

What CallOnClose and CallOnClose2 are badly named.
onclose is just a event handler attribute in the object.
The event name is 'close'.

I don't understand why nsWebSocket::CallOnClose
dispatches the event occasionally sync, occasionally async.
aNeedsNewEvent as a parameter name doesn't sounds like something which controls
synchronousness.
Comment 15 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-28 08:21:30 PDT
Comment on attachment 626580 [details] [diff] [review]
cleanup patch #2: prefer CLOSED to mDisconnect


> NS_IMETHODIMP
> nsWebSocket::OnStart(nsISupports *aContext)
> {
>   NS_ABORT_IF_FALSE(NS_IsMainThread(), "Not running on main thread");
>   NS_ABORT_IF_FALSE(mReadyState == nsIWebSocket::CONNECTING,
>                     "nsWebSocket::OnStart: readyState != CONNECTING!");
> 
>-  if (mDisconnected)
>-    return NS_OK;
>-
So, is this not possible if one first creates a connection and then
immediately calls close?


> NS_IMETHODIMP
> nsWebSocket::Cancel(nsresult aStatus)
> {
>   NS_ABORT_IF_FALSE(NS_IsMainThread(), "Not running on main thread");
> 
>-  if (mDisconnected)
>-    return NS_OK;
>+  ConsoleError();
> 
>-  ConsoleError();
>   return CloseConnection(nsIWebSocketChannel::CLOSE_GOING_AWAY);
Are you sure about this change? Is it possible that we end up giving console error even in cases it isn't that useful?
Comment 16 Olli Pettay [:smaug] (vacation Aug 25-28) 2012-05-28 08:50:43 PDT
Comment on attachment 626580 [details] [diff] [review]
cleanup patch #2: prefer CLOSED to mDisconnect

Marking this r- until the questions are answered.
Please re-ask r? if you think my concerns are invalid.
Comment 17 Jason Duell [:jduell] (needinfo me) 2012-06-02 23:34:58 PDT
Created attachment 629561 [details] [diff] [review]
Fix case where OnStart called but WS already closed

>> nsWebSocket::OnStart(nsISupports *aContext)
>> {
>>   NS_ABORT_IF_FALSE(mReadyState == nsIWebSocket::CONNECTING,
>>                     "nsWebSocket::OnStart: readyState != CONNECTING!");
>>-
>So, is this not possible if one first creates a connection and then
>immediately calls close?

Thanks for noticing this--right our current code would abort in this case (which is possible if the socket thread has already scheduled the OnStart event before we close the websocket).   So this is a bug not just in my patch, but the pre-existing code.
Comment 18 Jason Duell [:jduell] (needinfo me) 2012-06-02 23:42:18 PDT
Created attachment 629562 [details] [diff] [review]
Async onerror patch, interdiff v1 -> v2

Ollie: this is an interdiff with just the changes, which are mostly the renaming we discussed on IRC.  I also changed the code to make sure that later CloseConnection calls don't result in additional console errors (or affect closeWasClean).
Comment 19 Jason Duell [:jduell] (needinfo me) 2012-06-02 23:45:01 PDT
Created attachment 629563 [details] [diff] [review]
Async onerror patch, v2
Comment 20 Jason Duell [:jduell] (needinfo me) 2012-06-03 00:31:27 PDT
Created attachment 629566 [details] [diff] [review]
cleanup patch #2: prefer CLOSED to mDisconnect, v2

As request, no longer calls console error in cancel if already closing.

Also eased off on NS_ABORT_IF_FALSE in favor of assertions/warnings.  None of this stuff is worth killing the browser in production IMO.
Comment 21 Jason Duell [:jduell] (needinfo me) 2012-06-03 00:36:04 PDT
Ollie: FYI: sorry, I should have numbered these patches "part1, part2" etc.
This is their order, in case you want to apply them and/or review in order: 

- Test to check onerror/onclose aren't called during close() 
- Remove FailConnectionQuietly
- Async onerror patch, v2
- Fix case where OnStart called but WS already closed 
- cleanup patch #1: remove mOnCloseCompleted
- cleanup patch #2: prefer CLOSED to mDisconnect, v2
Comment 22 Jason Duell [:jduell] (needinfo me) 2012-06-03 00:51:25 PDT
Created attachment 629567 [details] [diff] [review]
cleanup patch #2: prefer CLOSED to mDisconnect, v3

Got assertion wrong: we can in fact be OPEN when we receive OnStop (if we encounter a protocol error from server).
Comment 23 Josh Matthews [:jdm] 2012-06-03 02:24:55 PDT
(In reply to Jason Duell (:jduell) from comment #20)
> Created attachment 629566 [details] [diff] [review]
> cleanup patch #2: prefer CLOSED to mDisconnect, v2
> 
> As request, no longer calls console error in cancel if already closing.
> 
> Also eased off on NS_ABORT_IF_FALSE in favor of assertions/warnings.  None
> of this stuff is worth killing the browser in production IMO.

NS_ABORT_IF_FALSE only aborts debug builds, for the record.
Comment 24 Jason Duell [:jduell] (needinfo me) 2012-06-04 16:40:10 PDT
Rolled up patches into one:

https://hg.mozilla.org/integration/mozilla-inbound/rev/bd94bf0f9632
Comment 25 Geoff Lankow (:darktrojan) 2012-06-05 06:19:39 PDT
https://hg.mozilla.org/mozilla-central/rev/bd94bf0f9632
Comment 26 Jason Duell [:jduell] (needinfo me) 2012-06-05 10:18:20 PDT
Created attachment 630214 [details] [diff] [review]
single, rolled-up patch that landed on m-c

If possible I'd like to land this on aurora after it bakes on m-c for a little while.  The bug vector here is not large (fixes case where we can call into JS's 'onclose()' function while the JS code has not yet returned from calling 'close()': also fixes case where we should report state==CLOSING but instead say CLOSED).  But there was quite a bit of (good!) refactoring involved.  I suspect that the code post-patch is tighter overall (so less latent bugs), and it would make maintenance easier if the old code lingers for only 6 weeks instead of 12.   If I've missed the train, and that's that, I totally understand...

[Approval Request Comment]
Testing completed (on m-c, etc.): mochitest.
Risk to taking this patch (and alternatives if risky): We have pretty good test coverage for this code, so fairly low risk. 
String or UUID changes made by this patch: none
Comment 27 Alex Keybl [:akeybl] 2012-06-11 12:34:37 PDT
Comment on attachment 630214 [details] [diff] [review]
single, rolled-up patch that landed on m-c

[Triage Comment]
Apologies, this is more code change than we're typically comfortable with taking while on Aurora. Let's let this ride the trains.

Note You need to log in before you can comment on or make changes to this bug.