Closed Bug 572975 Opened 14 years ago Closed 14 years ago

Investigate how WebSocket should keep itself alive when JS doesn't keep 'strong' references to it

Categories

(Core :: DOM: Core & HTML, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: smaug, Assigned: wfernandom2004)

References

Details

Attachments

(3 files, 9 obsolete files)

So nsWebSocketEstablishedConnection could addref mOwner when the connection
opens and release it when connection closes or disconnect is called.
And per the spec, the connection needs to say alive only when there are
message listeners.
But does the spec say that the object must be released if it has some event listeners, but not message listeners? IMHO it is a strange behaviour. Perhaps it should keep itself alive whenever it has some eventlistener set.
I don't think it is a strange behavior. If you're not listening any messages
from the websocket, and don't have any references to it so that you can't 
send messages either, why should the websocket object stay alive?
Well, for onclose and onerror I can't see, actually, a good reason to keep alive. But for the onopen there are. For instance you can have an onopen handler that sends something when the connection is established. I know it is a lot the same that xhr would do, but it is a possibility. Also, inside the onopen someone could set onmessage listeners, etc.
Indeed, open event is more like message, not like close or error.
Attached patch path v1 (obsolete) — Splinter Review
Olli, please see if this patch fix this issue.
Assignee: nobody → wfernandom2004
Status: NEW → ASSIGNED
Attachment #452419 - Flags: review?(Olli.Pettay)
Comment on attachment 452419 [details] [diff] [review]
path v1

I didn't test the patch.
Some comments anyway.


>diff --git a/content/base/src/nsWebSocket.cpp b/content/base/src/nsWebSocket.cpp
>--- a/content/base/src/nsWebSocket.cpp
>+++ b/content/base/src/nsWebSocket.cpp
>@@ -1944,16 +1944,21 @@ nsWebSocketEstablishedConnection::PrintE
> IMPL_RUNNABLE_ON_MAIN_THREAD_METHOD_BEGIN(Close)
> {
>   nsresult rv;
> 
>   if (mOwner->mReadyState == nsIWebSocket::CONNECTING) {
>     // we must not convey any failure information to scripts, so we just
>     // disconnect and maintain the owner WebSocket object in the CONNECTING
>     // state.
>+    if (mOwner->mHasStrongEventListeners) {
>+      mOwner->mCheckThereAreStrongEventListeners = PR_FALSE;
>+      mOwner->mHasStrongEventListeners = PR_FALSE;
>+      static_cast<nsPIDOMEventTarget*>(mOwner)->Release();
>+    }

Perhaps you could add a helper method for this code, since you have the same
code several times.
And are there cases when Disconnect() should do the same too?
Basically, what happens when someone navigates away from the page
which has open websocket with message listeners. Does the websocket
get released?
I wonder if the code should be moved to ::Disconnect()

>@@ -2978,17 +2995,23 @@ nsWebSocket::CreateAndDispatchSimpleEven
>   // it doesn't bubble, and it isn't cancelable
>   rv = event->InitEvent(aName, PR_FALSE, PR_FALSE);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>   nsCOMPtr<nsIPrivateDOMEvent> privateEvent = do_QueryInterface(event);
>   rv = privateEvent->SetTrusted(PR_TRUE);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>-  return DispatchDOMEvent(nsnull, event, nsnull, nsnull);
>+  rv = DispatchDOMEvent(nsnull, event, nsnull, nsnull);
>+
>+  if (aName.LowerCaseEqualsLiteral("open")) {
>+    UpdateMustKeepAlive();
>+  }

Could you explain this? And why LowerCaseEquals, why not just 
aName.EqualsLiteral("open");


>+nsresult
>+nsWebSocket::AddEventListenerByIID(nsIDOMEventListener *aListener,
>+                                   const nsIID& aIID)
>+{
>+  nsresult rv = nsDOMEventTargetHelper::AddEventListenerByIID(aListener,
>+                                                              aIID);
>+  if (NS_SUCCEEDED(rv)) {
>+    UpdateMustKeepAlive();
>+  }
>+  return rv;
>+}
>+
>+nsresult
>+nsWebSocket::RemoveEventListenerByIID(nsIDOMEventListener *aListener,
>+                                      const nsIID& aIID)
>+{
>+  nsresult rv = nsDOMEventTargetHelper::RemoveEventListenerByIID(aListener,
>+                                                                 aIID);
>+  if (NS_SUCCEEDED(rv)) {
>+    UpdateMustKeepAlive();
>+  }
>+  return rv;
>+}
Since JS can't call this, I think it is ok to not add UpdateMustKeepAlive to
these.
> Could you explain this?
The idea is to handle these cases:
  1. the connection is being established -> keep alive only if there are onopen or onmessage event listeners;
  2. the connection is open -> keep alive only if there are onmessage event listeners;
  3. the connection is going to close -> keep alive only if:
     3.1 there are onmessage event listeners (because the server can have not sent messages until sending the ack close frame); or
     3.2 the onclose event is going to be dispatched (i.e. if the connection wasn't aborted after/during the handshake). This situation only is important if there were onopen/onmessage event listeners. If there weren't such events it means that CC hasn't been called until this stage. But this last situation (no onopen/onmessage event listener and no CC) isn't really important to handle.
  4. the connection is closed -> don't need to keep alive;

In order to implement the cases 2 and 4 above, UpdateMustKeepAlive() is called after the onopen and onclose events are dispatched. The case 3 is handled by nsWebSocketEstablishedConnection::Close() and nsWebSocketEstablishedConnection::ForceClose().

> Basically, what happens when someone navigates away from the page
> which has open websocket with message listeners. Does the websocket
> get released?
Yes, it does. When someone navigates away from the page nsWebSocketEstablishedConnection::Cancel() is called. It will call nsWebSocketEstablishedConnection::Close(), that will release the nsWebSocket object if needed.

> And are there cases when Disconnect() should do the same too?
> I wonder if the code should be moved to ::Disconnect()
The code can be moved to Disconnect, but there should be a parameter allowing or not this code to be executed. It is because it should be executed only when the onclose event isn't going to be dispatched.

> And why LowerCaseEquals, why not just aName.EqualsLiteral("open");
Oh, thanks.
Attached patch patch v2 (obsolete) — Splinter Review
Attachment #452419 - Attachment is obsolete: true
Attachment #452507 - Flags: review?(Olli.Pettay)
Attachment #452419 - Flags: review?(Olli.Pettay)
Depends on: 573227
Attached patch patch v2 (obsolete) — Splinter Review
Olli, please apply this patch after the patch in bug 573227, because there are conflicts between them.
Attachment #452507 - Attachment is obsolete: true
Attachment #452509 - Flags: review?(Olli.Pettay)
Attachment #452507 - Flags: review?(Olli.Pettay)
I'm about to push both patches to tryserver in a few minutes.
The patch in this bug gives me test errors
failed | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - uncaught exception: [Exception... "Security error" code: "1000" nsresult: "0x805303e8 (NS_ERROR_DOM_SECURITY_ERR)" location: "http://mochi.test:8888/tests/content/base/test/test_websocket.html Line: 121"] at :0
failed | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - uncaught exception: [Exception... "Security error" code: "1000" nsresult: "0x805303e8 (NS_ERROR_DOM_SECURITY_ERR)" location: "http://mochi.test:8888/tests/content/base/test/test_websocket.html Line: 121"] at :0
Ah, ok, that should be easy to fix.
netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
to the anonymous function.
So how does the patch work when state is not CONNECTING and there are
message listeners and something calls .close().
What releases nsWebSocket object?


(btw, I pushed also the patch in this bug to tryserver. I just added
the missing netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
)
> So how does the patch work when state is not CONNECTING and there are
> message listeners and something calls .close().
> What releases nsWebSocket object?
In this case, following the spec, the close event (with wasClean==false) is dispatched, and then UpdateMustKeepAlive() will called.

BTW, if the functions that create and send the events fail (for instance CheckInnerWindowCorrectness() fails) the object won't be released. I'm going to fix it.
Attached patch v3 (obsolete) — Splinter Review
Attachment #452509 - Attachment is obsolete: true
Attachment #452582 - Flags: review?(Olli.Pettay)
Attachment #452509 - Flags: review?(Olli.Pettay)
s: try-w32-slave28
TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited
with code -1073741819 during test run
Bug 573298 - Intermittent failure in test_websocket.html | shouldn't connect
yet in test 10!
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application
crashed (minidump found)
Olli, these two erros I think they have been already fixed in bug 573227, and in the last patch I've uploaded:
  * The first error was because that test presumes that the WebSocket serialization list to connecting to the proxy must be empty. So I've reordered from test 10 to test 6.
  * The second error happened because some objects that nsEstablishedConnection used from nsWebSocket were being released before the nsEstablishedConnection::Disconnect() call (like the nsEstablishedConnection::mURI object).

I'm going to sleep right now. Please don't put this last patch in the TryServer yet because I want to make the "keep-alive" code more clear.
Ok, thanks for the explanation.
Comment on attachment 452582 [details] [diff] [review]
v3

I assume there is a new patch coming.
Attachment #452582 - Flags: review?(Olli.Pettay)
Attached patch v4 (obsolete) — Splinter Review
I've tested this patch many times without failures in my pc (on linux). I hope it is good :). Some reordering was necessary, again, because of the new two tests:
21 -> 17
17 -> 18
18 -> 19
19 -> 21
Attachment #452582 - Attachment is obsolete: true
Attachment #452786 - Flags: review?(Olli.Pettay)
So I pushed both patches to tryserver yesterday and there were no failures.
I'll review the patches today, and push to m-c if the tree is open
and green enough.
ok, thanks.
Comment on attachment 452786 [details] [diff] [review]
v4

>diff --git a/content/base/src/nsWebSocket.cpp b/content/base/src/nsWebSocket.cpp
>--- a/content/base/src/nsWebSocket.cpp
>+++ b/content/base/src/nsWebSocket.cpp
>@@ -235,17 +235,20 @@ public:
>   NS_IMETHOD SetProxyCredentials(const nsACString & aCredentials);
>   NS_IMETHOD SetWWWCredentials(const nsACString & aCredentials);
>   NS_IMETHOD OnAuthAvailable();
>   NS_IMETHOD OnAuthCancelled(PRBool userCancel);
> 
>   nsresult Init(nsWebSocket *aOwner);
>   nsresult Disconnect();
> 
>-  // these are called always on the main thread (they dispatch themselves)
>+  // These are called always on the main thread (they dispatch themselves).
>+  // ATTENTION, this method when called can release both the WebSocket object
>+  // (i.e. mOwner) and its connection (i.e. *this*) if there is no strong
>+  // js references.
"js references" sounds like a bit wrong. "if there are no strong event listeners"
might be better.

> IMPL_RUNNABLE_ON_MAIN_THREAD_METHOD_BEGIN(Close)
> {
>   nsresult rv;
> 
>+  // mOwner->DontKeepAliveAnyMore() can release this object, so we keep a
>+  // reference until the end of the method
>+  nsRefPtr<nsWebSocketEstablishedConnection> connTmp = this;
s/connTmp/kungfuDeathGrip/
kungfuDeathGrip is used elsewhere as the variable name for something
which just keeps and object alive.
Could you call DontKeepAliveAnyMore() in Disconnect()?

>@@ -2833,20 +2854,16 @@ NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END
> NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN_INHERITED(nsWebSocket,
>                                                 nsDOMEventTargetWrapperCache)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnOpenListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnMessageListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnCloseListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnErrorListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mPrincipal)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mURI)
>-  if (tmp->mConnection) {
>-    tmp->mConnection->Disconnect();
>-    tmp->mConnection = nsnull;
>-  }
Could you explain this change.
We release all the event listeners, but don't disconnect?

>@@ -3091,19 +3111,26 @@ nsWebSocket::SetReadyState(PRUint16 aNew
> 
>     // The close event must be dispatched asynchronously.
>     nsCOMPtr<nsIRunnable> event =
>       new nsWSCloseEvent(this, mConnection->ClosedCleanly());
> 
>     mOutgoingBufferedAmount += mConnection->GetOutgoingBufferedAmount();
>     mConnection = nsnull; // this is no longer necessary
> 
>+    if (!event) {
>+      NS_WARNING("Failed to create the close event");
>+      UpdateMustKeepAlive();
>+      return;
>+    }
I think you don't need this, because NS_DispatchToMainThread fails
if event is null and then you call UpdateMustKeepAlive() anyway.
>+
>     rv = NS_DispatchToMainThread(event, NS_DISPATCH_NORMAL);
>     if (NS_FAILED(rv)) {
>       NS_WARNING("Failed to dispatch the close event");
>+      UpdateMustKeepAlive();
>     }
>   }
> }




>+void
>+nsWebSocket::UpdateMustKeepAlive()
>+{
>+  if (!mCheckThereAreStrongEventListeners) {
>+    return;
>+  }
>+
>+  if (mHasStrongEventListeners) {
>+    if ((mReadyState != nsIWebSocket::CONNECTING ||
>+         (mListenerManager &&
>+          !mListenerManager->HasListenersFor(NS_LITERAL_STRING("open")))) &&
Perhaps 
if ((mReadyState != nsIWebSocket::CONNECTING ||
 (!mListenerManager ||
  !mListenerManager->HasListenersFor(NS_LITERAL_STRING("open")))) &&


>+        (mReadyState == nsIWebSocket::CLOSED ||
>+         (mListenerManager &&
>+          !mListenerManager->HasListenersFor(NS_LITERAL_STRING("message"))))) {
Perhaps 
       (mReadyState == nsIWebSocket::CLOSED ||
        (!mListenerManager ||
         !mListenerManager->HasListenersFor(NS_LITERAL_STRING("message"))))) {


>   if (mReadyState == nsIWebSocket::CONNECTING) {
>+    // FailConnection() can release the object if there is no strong js
>+    // references, so we keep a reference before calling it
>+    nsRefPtr<nsWebSocket> tmp = this;
kungfuDeathGrip

>+function forcegc()
>+{
>+  netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
>+  Components.utils.forceGC();
>+  var wu =  window.QueryInterface(Components.interfaces.nsIInterfaceRequestor)
>+                  .getInterface(Components.interfaces.nsIDOMWindowUtils);
>+  wu.garbageCollect();
>+  setTimeout(function()
>+  {
>+    wu.garbageCollect();
>+  }, 1);
The function for the timeout needs
netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
I don't understand how the test passed on your machine without that.

> function testWebSocket ()
> {
>+  netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
>   doTest(first_test);
> }
Why you need this change?
> Could you explain this change.
> We release all the event listeners, but don't disconnect?
The Disconnect() has been moved to the beginning of the nsWebSocket's destructor, in order to disconnect the connection before releasing the nsWebSocket's objects (like mURI). But I think I can move to the beginning of the CC unlink method, if you want.

> I don't understand how the test passed on your machine without that.
Neither do I. I thought calling it in the first function of the test (testWebSocket) would be enough.

> The function for the timeout needs
> netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
> I don't understand how the test passed on your machine without that.
But, isn't it also necessary before the timeout function, when it calls wu.garbageCollect() too?
I will change here, test and upload again the patch with the changes.
> But, isn't it also necessary before the timeout function, when it calls
> wu.garbageCollect() too?
Oh, I've just seen you have added that yet there.
Yeah, I added that there before pushing to tryserver, because without it
I got test failures.

And AFAIK, netscape.security.PrivilegeManager.enablePrivilege is function level thing. It doesn't apply to functions called under such function.
> The Disconnect() has been moved to the beginning of the nsWebSocket's
> destructor, in order to disconnect the connection before releasing the
> nsWebSocket's objects (like mURI). But I think I can move to the beginning of
> the CC unlink method, if you want.
I've found out that there are cases where the nsWebSocket object is destroyed before the CC's unlink method be called. For instance that happens in the test 20. So we need to disconnect the connection there too, or else we will have a nsWebSocketEstablishedConnection referencing a mOwner already gone.
Yeah, unlink doesn't happen always.
If the reference count drops to zero, object is deleted anyway.

So yes, please put Disconnect to both places.
Attached patch v5 (obsolete) — Splinter Review
Also, I've added more one test.
Attachment #452786 - Attachment is obsolete: true
Attachment #453108 - Flags: review?(Olli.Pettay)
Attachment #452786 - Flags: review?(Olli.Pettay)
Tryserver gave 
30382 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_ws_basic_tests.html | Test timed out.
And locally I get randomly some errors, like
failed | the ws connection in test 13 should be closed cleanly
failed | the ws connection in test 14 should be closed cleanly
In fact, if I get  errors locally on my Linux system, 
it seems to be always tests 13 and 14.
Strange, these errors don't happen in my machine.

> /tests/content/base/test/test_ws_basic_tests.html | Test timed out.
I took a look at this, it seems that your pass after receiving the huge message, and then the test times out, but I'm not sure? Actually, I don't know if it is stopping while receiving that message (because it is big and then the tryserver breaks the test), or it stops while waiting for the close event in order to make your test 5. 

> failures on tests 13 and 14.
Well, these are the only ones that call request.connection.write() directly in the test handler, because pywebsocket doesn't support what these tests are intended to to. Really, I need to think about what is going on in pywebsocket...
I'll upload a patch with some logging. Can you run it locally in your machine and send me the results, please?
Sure.
Attached patch logging patch (obsolete) — Splinter Review
Attached patch logging patchSplinter Review
Attachment #453425 - Attachment is obsolete: true
ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | the ws connection in test 13 should be closed cleanly

Full log 
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1277341329.1277343019.8799.gz&fulltext=1
Attached file failure test 13
The related part of the log is that in this attachment.
SSLTUNNEL: 0x8de8ff8
nsWebSocket: 0xb8f76a8
nsWebSocketEstablishedConnection: 0xd6937f8

The failure occurred because:
  -> the client received the close frame;
  -> the client sent the ack close frame before setting mPostedCloseFrame to TRUE (this is wrong)
Attached patch fix to test 13 (obsolete) — Splinter Review
Attached patch fix bufferedAmount (obsolete) — Splinter Review
The last patch introduced an error when calculating the bufferedAmount propoerty. Please apply this patch after the last one.
Attachment #453108 - Attachment is obsolete: true
Attachment #453750 - Attachment is obsolete: true
Attachment #453786 - Attachment is obsolete: true
Attachment #453846 - Flags: review?(Olli.Pettay)
Attachment #453108 - Flags: review?(Olli.Pettay)
Attachment #453846 - Flags: review?(Olli.Pettay) → review+
I pushed this to m-c
http://hg.mozilla.org/mozilla-central/rev/29e8ddc0229f

Marking fixed after the tests have run.
So far looking good.
Marking this fixed.

Maybe bug 573227 could be marked fixed once there are some more test runs.
Need to also go through all the other websocket related test failure bugs.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I was thinking... Maybe there are more two cases that WebSocket objects should be kept alive, mainly the first one:
  * When there are not sent outgoing messages;
  * When at least one open or message events has been received, and there are close events listeners (the close event could be flagged as 'strong' in this case);
I'm not 100% sure those are really important cases.

File a followup bug? And could you write an email to webapps wg mailing list about the garbage collection issues?
Ok. I will send an email the the list, and depending on the answers I file a followup bug.
Depends on: 621347
Flags: in-testsuite+
Version: unspecified → Trunk
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: