Closed Bug 695636 Opened 13 years ago Closed 12 years ago

Update close steps to adhere to WS spec.

Categories

(Core :: Networking: WebSockets, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla16

People

(Reporter: jduell.mcbugs, Assigned: jduell.mcbugs)

References

Details

Attachments

(9 files, 3 obsolete files)

958 bytes, patch
smaug
: feedback-
Details | Diff | Splinter Review
3.09 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.69 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.30 KB, patch
smaug
: review+
Details | Diff | Splinter Review
1.39 KB, patch
smaug
: review+
Details | Diff | Splinter Review
7.70 KB, patch
Details | Diff | Splinter Review
18.78 KB, patch
smaug
: review+
Details | Diff | Splinter Review
8.07 KB, patch
smaug
: review+
Details | Diff | Splinter Review
29.29 KB, patch
jduell.mcbugs
: review+
Details | Diff | Splinter Review
We've got a number of little ways that our close logic does not follow the latest spec and/or needs cleanup.

In the case where JS calls close() before the WS is connected, we need to set state to CLOSING (now: set to CLOSED), and queue function that 
    - sets state to CLOSED
    - calls onerror if needed
    - calls onclose

- OnStop from the network needs to do similar logic, but we need to sure make it doesn't do it if the case above has already queued the onclose logic (i.e.  don't call onclose or onerror twice).  The code in bug 664894 corrected places where we needed to call onerror, but as a result we can now call onerror twice if JS calls close() before the connection is established AND the network winds up failing the connection independently.

- SetReadyState generally is setting state to CLOSED while we're still in the JS call context instead of setting to CLOSING and doing CLOSED in queued event.

- Garbage collection.  Currently we call Disconnect in CloseConnection right after calling SetReadyState(CLOSED), which will both null out mMustCheckKeepAlive and call Release() before we've actually dispatched the onclose event.  I suspect we want to delay that until the dispatch is complete.

- We have lots of places with check with if (mDisconnected) Disconnect, which doesn't make sense.  Looks like an artifact exposed by removing nsWSEstablishedCxn.

I'm going to do an overall audit of our close code to fix this and anything else that comes up.  But I'm planning to do so after I land an initial implementation of the binary APIs so my patch queue stops rotting.

We need to fix the web-facing parts of this bug before unprefixing.
I don't think we need this to unprefix, though it'd be nice to have.
No longer blocks: 695635
This is a hack (which is probably too easter-egg-y to check in) that fails the websocket early (in EstablishConnection) only for the magic URI in test_websocket.html, test-3.

It demonstrates that the current code winds up not calling onerror (it tried to, but it calls it within Init(), before JS has actually had a chance to set the onerror handler, which is lower down in the JS script).

Works fine with my later patches applied.

Not requesting review because I suspect we don't allow this sort of gross magic URI stuff in m-c.  Smaug--let me know if you actually think it's OK to land this sort of thing.
Assignee: nobody → jduell.mcbugs
Status: NEW → ASSIGNED
Attachment #626328 - Flags: feedback?(bugs)
Tests that we dispatch onerror/onclose within a separate event.  Fails now, works with my patches applied.  Also checks that readyState == CLOSING (not CLOSED) immediately after calling close().
Attachment #626329 - Flags: review?(bugs)
Remove FailConnectionQuietly, which is badly named (it doesn't actually "fail the connection" in the spec sense: onerror isn't called), and is just a one-liner anyway.
Attachment #626331 - Flags: review?(bugs)
Comment on attachment 626328 [details] [diff] [review]
hack to check that onerror is called correctly in test-3

Er, no thank you :)
Attachment #626328 - Flags: feedback?(bugs) → feedback-
(In reply to Jason Duell (:jduell) from comment #2)
> It demonstrates that the current code winds up not calling onerror (it tried
> to, but it calls it within Init(), before JS has actually had a chance to
> set the onerror handler, which is lower down in the JS script).
> 
> Works fine with my later patches applied.
So, what work and what is wrong with error handling?
Comment on attachment 626329 [details] [diff] [review]
Test to check onerror/onclose aren't called during close()

So, does this depend on the hack? I assume this does, so clearing review.
Attachment #626329 - Flags: review?(bugs)
Attachment #626331 - Flags: review?(bugs) → review+
> - Garbage collection.  Currently we call Disconnect in CloseConnection right
> after calling SetReadyState(CLOSED), which will both null out
> mMustCheckKeepAlive and call Release() before we've actually dispatched the
> onclose event.  I suspect we want to delay that until the dispatch is
> complete.
Why?
There is nsRefPtr<nsWebSocket> kungfuDeathGrip = this; which keeps the object alive.
Comment on attachment 626329 [details] [diff] [review]
Test to check onerror/onclose aren't called during close()

No, this mochitest change doesn't depend on the gross hack.
Attachment #626329 - Flags: review?(bugs)
> There is nsRefPtr<nsWebSocket> kungfuDeathGrip = this; which keeps the object alive.

kungfu goes out of scope before the onclose error is dispatched. That said, I haven't seen an actual problem with the logic, even when I applied my patch that has the necko channel release the nsWebSocket after calling OnStop (right now it keeps a reference to nsWebSocket until it's destroyed, which is no sooner than nsWebSocket releases it in Disconnect.

Anyway, my next patch changes things enough that you just look at that.
The meat of the bug fix.  

Makes onerror (and Disconnect) happen, like onclose, in a separate event if needed (it's not needed if trigger by OnStop, which is already async vis a vis JD calls).

Gets rid of most kungFuDeathGrips because we no longer call Disconnect until the event fires.
Attachment #626577 - Flags: review?(bugs)
mOnCloseCompleted is superfluous.  The only time we set it is after we've just set mReadyState to CLOSED (in CallOnClose2), so it's equivalent to mReadyState==CLOSED.
Attachment #626579 - Flags: review?(bugs)
Prefer checking mReadyState == CLOSED to checking mDisconnected: I think it makes the code a lot more intuitive.

They're both set in same event (that fires onclose/onerror), so equivalent in all the cases here.

I'm not sure why we need to guard GetInterface?
Attachment #626580 - Flags: review?(bugs)
Comment on attachment 626577 [details] [diff] [review]
Make sure onerror called in an event that's async to any JS calls

What CallOnClose and CallOnClose2 are badly named.
onclose is just a event handler attribute in the object.
The event name is 'close'.

I don't understand why nsWebSocket::CallOnClose
dispatches the event occasionally sync, occasionally async.
aNeedsNewEvent as a parameter name doesn't sounds like something which controls
synchronousness.
Attachment #626577 - Flags: review?(bugs) → review-
Attachment #626579 - Flags: review?(bugs) → review+
Attachment #626580 - Attachment is patch: true
Comment on attachment 626580 [details] [diff] [review]
cleanup patch #2: prefer CLOSED to mDisconnect


> NS_IMETHODIMP
> nsWebSocket::OnStart(nsISupports *aContext)
> {
>   NS_ABORT_IF_FALSE(NS_IsMainThread(), "Not running on main thread");
>   NS_ABORT_IF_FALSE(mReadyState == nsIWebSocket::CONNECTING,
>                     "nsWebSocket::OnStart: readyState != CONNECTING!");
> 
>-  if (mDisconnected)
>-    return NS_OK;
>-
So, is this not possible if one first creates a connection and then
immediately calls close?


> NS_IMETHODIMP
> nsWebSocket::Cancel(nsresult aStatus)
> {
>   NS_ABORT_IF_FALSE(NS_IsMainThread(), "Not running on main thread");
> 
>-  if (mDisconnected)
>-    return NS_OK;
>+  ConsoleError();
> 
>-  ConsoleError();
>   return CloseConnection(nsIWebSocketChannel::CLOSE_GOING_AWAY);
Are you sure about this change? Is it possible that we end up giving console error even in cases it isn't that useful?
Attachment #626329 - Flags: review?(bugs) → review+
Comment on attachment 626580 [details] [diff] [review]
cleanup patch #2: prefer CLOSED to mDisconnect

Marking this r- until the questions are answered.
Please re-ask r? if you think my concerns are invalid.
Attachment #626580 - Flags: review?(bugs) → review-
>> nsWebSocket::OnStart(nsISupports *aContext)
>> {
>>   NS_ABORT_IF_FALSE(mReadyState == nsIWebSocket::CONNECTING,
>>                     "nsWebSocket::OnStart: readyState != CONNECTING!");
>>-
>So, is this not possible if one first creates a connection and then
>immediately calls close?

Thanks for noticing this--right our current code would abort in this case (which is possible if the socket thread has already scheduled the OnStart event before we close the websocket).   So this is a bug not just in my patch, but the pre-existing code.
Attachment #629561 - Flags: review?(bugs)
Ollie: this is an interdiff with just the changes, which are mostly the renaming we discussed on IRC.  I also changed the code to make sure that later CloseConnection calls don't result in additional console errors (or affect closeWasClean).
Attachment #626577 - Attachment is obsolete: true
Attachment #629563 - Flags: review?(bugs)
As request, no longer calls console error in cancel if already closing.

Also eased off on NS_ABORT_IF_FALSE in favor of assertions/warnings.  None of this stuff is worth killing the browser in production IMO.
Attachment #626580 - Attachment is obsolete: true
Attachment #629566 - Flags: review?(bugs)
Ollie: FYI: sorry, I should have numbered these patches "part1, part2" etc.
This is their order, in case you want to apply them and/or review in order: 

- Test to check onerror/onclose aren't called during close() 
- Remove FailConnectionQuietly
- Async onerror patch, v2
- Fix case where OnStart called but WS already closed 
- cleanup patch #1: remove mOnCloseCompleted
- cleanup patch #2: prefer CLOSED to mDisconnect, v2
Got assertion wrong: we can in fact be OPEN when we receive OnStop (if we encounter a protocol error from server).
Attachment #629566 - Attachment is obsolete: true
Attachment #629566 - Flags: review?(bugs)
Attachment #629567 - Flags: review?(bugs)
(In reply to Jason Duell (:jduell) from comment #20)
> Created attachment 629566 [details] [diff] [review]
> cleanup patch #2: prefer CLOSED to mDisconnect, v2
> 
> As request, no longer calls console error in cancel if already closing.
> 
> Also eased off on NS_ABORT_IF_FALSE in favor of assertions/warnings.  None
> of this stuff is worth killing the browser in production IMO.

NS_ABORT_IF_FALSE only aborts debug builds, for the record.
Attachment #629561 - Flags: review?(bugs) → review+
Attachment #629563 - Flags: review?(bugs) → review+
Attachment #629567 - Attachment is patch: true
Attachment #629567 - Flags: review?(bugs) → review+
Rolled up patches into one:

https://hg.mozilla.org/integration/mozilla-inbound/rev/bd94bf0f9632
Target Milestone: --- → mozilla16
https://hg.mozilla.org/mozilla-central/rev/bd94bf0f9632
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
If possible I'd like to land this on aurora after it bakes on m-c for a little while.  The bug vector here is not large (fixes case where we can call into JS's 'onclose()' function while the JS code has not yet returned from calling 'close()': also fixes case where we should report state==CLOSING but instead say CLOSED).  But there was quite a bit of (good!) refactoring involved.  I suspect that the code post-patch is tighter overall (so less latent bugs), and it would make maintenance easier if the old code lingers for only 6 weeks instead of 12.   If I've missed the train, and that's that, I totally understand...

[Approval Request Comment]
Testing completed (on m-c, etc.): mochitest.
Risk to taking this patch (and alternatives if risky): We have pretty good test coverage for this code, so fairly low risk. 
String or UUID changes made by this patch: none
Attachment #630214 - Flags: review+
Attachment #630214 - Flags: approval-mozilla-aurora?
Comment on attachment 630214 [details] [diff] [review]
single, rolled-up patch that landed on m-c

[Triage Comment]
Apologies, this is more code change than we're typically comfortable with taking while on Aurora. Let's let this ride the trains.
Attachment #630214 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: