Investigate how WebSocket should keep itself alive when JS doesn't keep 'strong' references to it

RESOLVED FIXED

Status

()

Core
DOM
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: smaug, Assigned: Wellington Fernando de Macedo)

Tracking

Trunk
x86
All
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 9 obsolete attachments)

(Reporter)

Comment 1

8 years ago
So nsWebSocketEstablishedConnection could addref mOwner when the connection
opens and release it when connection closes or disconnect is called.
(Reporter)

Comment 2

8 years ago
And per the spec, the connection needs to say alive only when there are
message listeners.
(Assignee)

Comment 3

8 years ago
But does the spec say that the object must be released if it has some event listeners, but not message listeners? IMHO it is a strange behaviour. Perhaps it should keep itself alive whenever it has some eventlistener set.
(Reporter)

Comment 4

8 years ago
I don't think it is a strange behavior. If you're not listening any messages
from the websocket, and don't have any references to it so that you can't 
send messages either, why should the websocket object stay alive?
(Assignee)

Comment 5

8 years ago
Well, for onclose and onerror I can't see, actually, a good reason to keep alive. But for the onopen there are. For instance you can have an onopen handler that sends something when the connection is established. I know it is a lot the same that xhr would do, but it is a possibility. Also, inside the onopen someone could set onmessage listeners, etc.
(Reporter)

Comment 6

8 years ago
Indeed, open event is more like message, not like close or error.
(Assignee)

Comment 7

8 years ago
Created attachment 452419 [details] [diff] [review]
path v1

Olli, please see if this patch fix this issue.
Assignee: nobody → wfernandom2004
Status: NEW → ASSIGNED
Attachment #452419 - Flags: review?(Olli.Pettay)
(Reporter)

Comment 8

8 years ago
Comment on attachment 452419 [details] [diff] [review]
path v1

I didn't test the patch.
Some comments anyway.


>diff --git a/content/base/src/nsWebSocket.cpp b/content/base/src/nsWebSocket.cpp
>--- a/content/base/src/nsWebSocket.cpp
>+++ b/content/base/src/nsWebSocket.cpp
>@@ -1944,16 +1944,21 @@ nsWebSocketEstablishedConnection::PrintE
> IMPL_RUNNABLE_ON_MAIN_THREAD_METHOD_BEGIN(Close)
> {
>   nsresult rv;
> 
>   if (mOwner->mReadyState == nsIWebSocket::CONNECTING) {
>     // we must not convey any failure information to scripts, so we just
>     // disconnect and maintain the owner WebSocket object in the CONNECTING
>     // state.
>+    if (mOwner->mHasStrongEventListeners) {
>+      mOwner->mCheckThereAreStrongEventListeners = PR_FALSE;
>+      mOwner->mHasStrongEventListeners = PR_FALSE;
>+      static_cast<nsPIDOMEventTarget*>(mOwner)->Release();
>+    }

Perhaps you could add a helper method for this code, since you have the same
code several times.
And are there cases when Disconnect() should do the same too?
Basically, what happens when someone navigates away from the page
which has open websocket with message listeners. Does the websocket
get released?
I wonder if the code should be moved to ::Disconnect()

>@@ -2978,17 +2995,23 @@ nsWebSocket::CreateAndDispatchSimpleEven
>   // it doesn't bubble, and it isn't cancelable
>   rv = event->InitEvent(aName, PR_FALSE, PR_FALSE);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>   nsCOMPtr<nsIPrivateDOMEvent> privateEvent = do_QueryInterface(event);
>   rv = privateEvent->SetTrusted(PR_TRUE);
>   NS_ENSURE_SUCCESS(rv, rv);
> 
>-  return DispatchDOMEvent(nsnull, event, nsnull, nsnull);
>+  rv = DispatchDOMEvent(nsnull, event, nsnull, nsnull);
>+
>+  if (aName.LowerCaseEqualsLiteral("open")) {
>+    UpdateMustKeepAlive();
>+  }

Could you explain this? And why LowerCaseEquals, why not just 
aName.EqualsLiteral("open");


>+nsresult
>+nsWebSocket::AddEventListenerByIID(nsIDOMEventListener *aListener,
>+                                   const nsIID& aIID)
>+{
>+  nsresult rv = nsDOMEventTargetHelper::AddEventListenerByIID(aListener,
>+                                                              aIID);
>+  if (NS_SUCCEEDED(rv)) {
>+    UpdateMustKeepAlive();
>+  }
>+  return rv;
>+}
>+
>+nsresult
>+nsWebSocket::RemoveEventListenerByIID(nsIDOMEventListener *aListener,
>+                                      const nsIID& aIID)
>+{
>+  nsresult rv = nsDOMEventTargetHelper::RemoveEventListenerByIID(aListener,
>+                                                                 aIID);
>+  if (NS_SUCCEEDED(rv)) {
>+    UpdateMustKeepAlive();
>+  }
>+  return rv;
>+}
Since JS can't call this, I think it is ok to not add UpdateMustKeepAlive to
these.
(Assignee)

Comment 9

8 years ago
> Could you explain this?
The idea is to handle these cases:
  1. the connection is being established -> keep alive only if there are onopen or onmessage event listeners;
  2. the connection is open -> keep alive only if there are onmessage event listeners;
  3. the connection is going to close -> keep alive only if:
     3.1 there are onmessage event listeners (because the server can have not sent messages until sending the ack close frame); or
     3.2 the onclose event is going to be dispatched (i.e. if the connection wasn't aborted after/during the handshake). This situation only is important if there were onopen/onmessage event listeners. If there weren't such events it means that CC hasn't been called until this stage. But this last situation (no onopen/onmessage event listener and no CC) isn't really important to handle.
  4. the connection is closed -> don't need to keep alive;

In order to implement the cases 2 and 4 above, UpdateMustKeepAlive() is called after the onopen and onclose events are dispatched. The case 3 is handled by nsWebSocketEstablishedConnection::Close() and nsWebSocketEstablishedConnection::ForceClose().

> Basically, what happens when someone navigates away from the page
> which has open websocket with message listeners. Does the websocket
> get released?
Yes, it does. When someone navigates away from the page nsWebSocketEstablishedConnection::Cancel() is called. It will call nsWebSocketEstablishedConnection::Close(), that will release the nsWebSocket object if needed.

> And are there cases when Disconnect() should do the same too?
> I wonder if the code should be moved to ::Disconnect()
The code can be moved to Disconnect, but there should be a parameter allowing or not this code to be executed. It is because it should be executed only when the onclose event isn't going to be dispatched.

> And why LowerCaseEquals, why not just aName.EqualsLiteral("open");
Oh, thanks.
(Assignee)

Comment 10

8 years ago
Created attachment 452507 [details] [diff] [review]
patch v2
Attachment #452419 - Attachment is obsolete: true
Attachment #452507 - Flags: review?(Olli.Pettay)
Attachment #452419 - Flags: review?(Olli.Pettay)
(Assignee)

Updated

8 years ago
Depends on: 573227
(Assignee)

Comment 11

8 years ago
Created attachment 452509 [details] [diff] [review]
patch v2

Olli, please apply this patch after the patch in bug 573227, because there are conflicts between them.
Attachment #452507 - Attachment is obsolete: true
Attachment #452509 - Flags: review?(Olli.Pettay)
Attachment #452507 - Flags: review?(Olli.Pettay)
(Reporter)

Comment 12

8 years ago
I'm about to push both patches to tryserver in a few minutes.
(Reporter)

Comment 13

8 years ago
The patch in this bug gives me test errors
failed | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - uncaught exception: [Exception... "Security error" code: "1000" nsresult: "0x805303e8 (NS_ERROR_DOM_SECURITY_ERR)" location: "http://mochi.test:8888/tests/content/base/test/test_websocket.html Line: 121"] at :0
failed | [SimpleTest/SimpleTest.js, window.onerror] An error occurred - uncaught exception: [Exception... "Security error" code: "1000" nsresult: "0x805303e8 (NS_ERROR_DOM_SECURITY_ERR)" location: "http://mochi.test:8888/tests/content/base/test/test_websocket.html Line: 121"] at :0
(Reporter)

Comment 14

8 years ago
Ah, ok, that should be easy to fix.
netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
to the anonymous function.
(Reporter)

Comment 15

8 years ago
So how does the patch work when state is not CONNECTING and there are
message listeners and something calls .close().
What releases nsWebSocket object?


(btw, I pushed also the patch in this bug to tryserver. I just added
the missing netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
)
(Assignee)

Comment 16

8 years ago
> So how does the patch work when state is not CONNECTING and there are
> message listeners and something calls .close().
> What releases nsWebSocket object?
In this case, following the spec, the close event (with wasClean==false) is dispatched, and then UpdateMustKeepAlive() will called.

BTW, if the functions that create and send the events fail (for instance CheckInnerWindowCorrectness() fails) the object won't be released. I'm going to fix it.
(Assignee)

Comment 17

8 years ago
Created attachment 452582 [details] [diff] [review]
v3
Attachment #452509 - Attachment is obsolete: true
Attachment #452582 - Flags: review?(Olli.Pettay)
Attachment #452509 - Flags: review?(Olli.Pettay)
(Reporter)

Comment 18

8 years ago
s: try-w32-slave28
TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | Exited
with code -1073741819 during test run
Bug 573298 - Intermittent failure in test_websocket.html | shouldn't connect
yet in test 10!
PROCESS-CRASH | /tests/content/base/test/test_websocket.html | application
crashed (minidump found)
(Assignee)

Comment 19

8 years ago
Olli, these two erros I think they have been already fixed in bug 573227, and in the last patch I've uploaded:
  * The first error was because that test presumes that the WebSocket serialization list to connecting to the proxy must be empty. So I've reordered from test 10 to test 6.
  * The second error happened because some objects that nsEstablishedConnection used from nsWebSocket were being released before the nsEstablishedConnection::Disconnect() call (like the nsEstablishedConnection::mURI object).

I'm going to sleep right now. Please don't put this last patch in the TryServer yet because I want to make the "keep-alive" code more clear.
(Reporter)

Comment 20

8 years ago
Ok, thanks for the explanation.
(Reporter)

Comment 21

8 years ago
Comment on attachment 452582 [details] [diff] [review]
v3

I assume there is a new patch coming.
Attachment #452582 - Flags: review?(Olli.Pettay)
(Assignee)

Comment 22

8 years ago
Created attachment 452786 [details] [diff] [review]
v4

I've tested this patch many times without failures in my pc (on linux). I hope it is good :). Some reordering was necessary, again, because of the new two tests:
21 -> 17
17 -> 18
18 -> 19
19 -> 21
Attachment #452582 - Attachment is obsolete: true
Attachment #452786 - Flags: review?(Olli.Pettay)
(Reporter)

Comment 23

8 years ago
So I pushed both patches to tryserver yesterday and there were no failures.
I'll review the patches today, and push to m-c if the tree is open
and green enough.
(Assignee)

Comment 24

8 years ago
ok, thanks.
(Reporter)

Comment 25

8 years ago
Comment on attachment 452786 [details] [diff] [review]
v4

>diff --git a/content/base/src/nsWebSocket.cpp b/content/base/src/nsWebSocket.cpp
>--- a/content/base/src/nsWebSocket.cpp
>+++ b/content/base/src/nsWebSocket.cpp
>@@ -235,17 +235,20 @@ public:
>   NS_IMETHOD SetProxyCredentials(const nsACString & aCredentials);
>   NS_IMETHOD SetWWWCredentials(const nsACString & aCredentials);
>   NS_IMETHOD OnAuthAvailable();
>   NS_IMETHOD OnAuthCancelled(PRBool userCancel);
> 
>   nsresult Init(nsWebSocket *aOwner);
>   nsresult Disconnect();
> 
>-  // these are called always on the main thread (they dispatch themselves)
>+  // These are called always on the main thread (they dispatch themselves).
>+  // ATTENTION, this method when called can release both the WebSocket object
>+  // (i.e. mOwner) and its connection (i.e. *this*) if there is no strong
>+  // js references.
"js references" sounds like a bit wrong. "if there are no strong event listeners"
might be better.

> IMPL_RUNNABLE_ON_MAIN_THREAD_METHOD_BEGIN(Close)
> {
>   nsresult rv;
> 
>+  // mOwner->DontKeepAliveAnyMore() can release this object, so we keep a
>+  // reference until the end of the method
>+  nsRefPtr<nsWebSocketEstablishedConnection> connTmp = this;
s/connTmp/kungfuDeathGrip/
kungfuDeathGrip is used elsewhere as the variable name for something
which just keeps and object alive.
Could you call DontKeepAliveAnyMore() in Disconnect()?

>@@ -2833,20 +2854,16 @@ NS_IMPL_CYCLE_COLLECTION_TRAVERSE_END
> NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN_INHERITED(nsWebSocket,
>                                                 nsDOMEventTargetWrapperCache)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnOpenListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnMessageListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnCloseListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mOnErrorListener)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mPrincipal)
>   NS_IMPL_CYCLE_COLLECTION_UNLINK_NSCOMPTR(mURI)
>-  if (tmp->mConnection) {
>-    tmp->mConnection->Disconnect();
>-    tmp->mConnection = nsnull;
>-  }
Could you explain this change.
We release all the event listeners, but don't disconnect?

>@@ -3091,19 +3111,26 @@ nsWebSocket::SetReadyState(PRUint16 aNew
> 
>     // The close event must be dispatched asynchronously.
>     nsCOMPtr<nsIRunnable> event =
>       new nsWSCloseEvent(this, mConnection->ClosedCleanly());
> 
>     mOutgoingBufferedAmount += mConnection->GetOutgoingBufferedAmount();
>     mConnection = nsnull; // this is no longer necessary
> 
>+    if (!event) {
>+      NS_WARNING("Failed to create the close event");
>+      UpdateMustKeepAlive();
>+      return;
>+    }
I think you don't need this, because NS_DispatchToMainThread fails
if event is null and then you call UpdateMustKeepAlive() anyway.
>+
>     rv = NS_DispatchToMainThread(event, NS_DISPATCH_NORMAL);
>     if (NS_FAILED(rv)) {
>       NS_WARNING("Failed to dispatch the close event");
>+      UpdateMustKeepAlive();
>     }
>   }
> }




>+void
>+nsWebSocket::UpdateMustKeepAlive()
>+{
>+  if (!mCheckThereAreStrongEventListeners) {
>+    return;
>+  }
>+
>+  if (mHasStrongEventListeners) {
>+    if ((mReadyState != nsIWebSocket::CONNECTING ||
>+         (mListenerManager &&
>+          !mListenerManager->HasListenersFor(NS_LITERAL_STRING("open")))) &&
Perhaps 
if ((mReadyState != nsIWebSocket::CONNECTING ||
 (!mListenerManager ||
  !mListenerManager->HasListenersFor(NS_LITERAL_STRING("open")))) &&


>+        (mReadyState == nsIWebSocket::CLOSED ||
>+         (mListenerManager &&
>+          !mListenerManager->HasListenersFor(NS_LITERAL_STRING("message"))))) {
Perhaps 
       (mReadyState == nsIWebSocket::CLOSED ||
        (!mListenerManager ||
         !mListenerManager->HasListenersFor(NS_LITERAL_STRING("message"))))) {


>   if (mReadyState == nsIWebSocket::CONNECTING) {
>+    // FailConnection() can release the object if there is no strong js
>+    // references, so we keep a reference before calling it
>+    nsRefPtr<nsWebSocket> tmp = this;
kungfuDeathGrip

>+function forcegc()
>+{
>+  netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
>+  Components.utils.forceGC();
>+  var wu =  window.QueryInterface(Components.interfaces.nsIInterfaceRequestor)
>+                  .getInterface(Components.interfaces.nsIDOMWindowUtils);
>+  wu.garbageCollect();
>+  setTimeout(function()
>+  {
>+    wu.garbageCollect();
>+  }, 1);
The function for the timeout needs
netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
I don't understand how the test passed on your machine without that.

> function testWebSocket ()
> {
>+  netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
>   doTest(first_test);
> }
Why you need this change?
(Assignee)

Comment 26

8 years ago
> Could you explain this change.
> We release all the event listeners, but don't disconnect?
The Disconnect() has been moved to the beginning of the nsWebSocket's destructor, in order to disconnect the connection before releasing the nsWebSocket's objects (like mURI). But I think I can move to the beginning of the CC unlink method, if you want.

> I don't understand how the test passed on your machine without that.
Neither do I. I thought calling it in the first function of the test (testWebSocket) would be enough.

> The function for the timeout needs
> netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect");
> I don't understand how the test passed on your machine without that.
But, isn't it also necessary before the timeout function, when it calls wu.garbageCollect() too?
(Assignee)

Comment 27

8 years ago
I will change here, test and upload again the patch with the changes.
(Assignee)

Comment 28

8 years ago
> But, isn't it also necessary before the timeout function, when it calls
> wu.garbageCollect() too?
Oh, I've just seen you have added that yet there.
(Reporter)

Comment 29

8 years ago
Yeah, I added that there before pushing to tryserver, because without it
I got test failures.

And AFAIK, netscape.security.PrivilegeManager.enablePrivilege is function level thing. It doesn't apply to functions called under such function.
(Assignee)

Comment 30

8 years ago
> The Disconnect() has been moved to the beginning of the nsWebSocket's
> destructor, in order to disconnect the connection before releasing the
> nsWebSocket's objects (like mURI). But I think I can move to the beginning of
> the CC unlink method, if you want.
I've found out that there are cases where the nsWebSocket object is destroyed before the CC's unlink method be called. For instance that happens in the test 20. So we need to disconnect the connection there too, or else we will have a nsWebSocketEstablishedConnection referencing a mOwner already gone.
(Reporter)

Comment 31

8 years ago
Yeah, unlink doesn't happen always.
If the reference count drops to zero, object is deleted anyway.

So yes, please put Disconnect to both places.
(Assignee)

Comment 32

8 years ago
Created attachment 453108 [details] [diff] [review]
v5

Also, I've added more one test.
Attachment #452786 - Attachment is obsolete: true
Attachment #453108 - Flags: review?(Olli.Pettay)
Attachment #452786 - Flags: review?(Olli.Pettay)
(Reporter)

Comment 33

8 years ago
Tryserver gave 
30382 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_ws_basic_tests.html | Test timed out.
(Reporter)

Comment 34

8 years ago
And locally I get randomly some errors, like
failed | the ws connection in test 13 should be closed cleanly
failed | the ws connection in test 14 should be closed cleanly
(Reporter)

Comment 35

8 years ago
In fact, if I get  errors locally on my Linux system, 
it seems to be always tests 13 and 14.
(Assignee)

Comment 36

8 years ago
Strange, these errors don't happen in my machine.

> /tests/content/base/test/test_ws_basic_tests.html | Test timed out.
I took a look at this, it seems that your pass after receiving the huge message, and then the test times out, but I'm not sure? Actually, I don't know if it is stopping while receiving that message (because it is big and then the tryserver breaks the test), or it stops while waiting for the close event in order to make your test 5. 

> failures on tests 13 and 14.
Well, these are the only ones that call request.connection.write() directly in the test handler, because pywebsocket doesn't support what these tests are intended to to. Really, I need to think about what is going on in pywebsocket...
(Assignee)

Comment 37

8 years ago
I'll upload a patch with some logging. Can you run it locally in your machine and send me the results, please?
(Reporter)

Comment 38

8 years ago
Sure.
(Assignee)

Comment 39

8 years ago
Created attachment 453425 [details] [diff] [review]
logging patch
(Assignee)

Comment 40

8 years ago
Created attachment 453427 [details] [diff] [review]
logging patch
Attachment #453425 - Attachment is obsolete: true
(Reporter)

Comment 41

8 years ago
ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket.html | the ws connection in test 13 should be closed cleanly

Full log 
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1277341329.1277343019.8799.gz&fulltext=1
(Assignee)

Comment 42

8 years ago
Created attachment 453744 [details]
failure test 13

The related part of the log is that in this attachment.
SSLTUNNEL: 0x8de8ff8
nsWebSocket: 0xb8f76a8
nsWebSocketEstablishedConnection: 0xd6937f8

The failure occurred because:
  -> the client received the close frame;
  -> the client sent the ack close frame before setting mPostedCloseFrame to TRUE (this is wrong)
(Assignee)

Comment 43

8 years ago
Created attachment 453750 [details] [diff] [review]
fix to test 13
(Assignee)

Comment 44

8 years ago
Created attachment 453786 [details] [diff] [review]
fix bufferedAmount

The last patch introduced an error when calculating the bufferedAmount propoerty. Please apply this patch after the last one.
(Assignee)

Comment 45

8 years ago
Created attachment 453846 [details] [diff] [review]
full path fot this bug and bug 573227
Attachment #453108 - Attachment is obsolete: true
Attachment #453750 - Attachment is obsolete: true
Attachment #453786 - Attachment is obsolete: true
Attachment #453846 - Flags: review?(Olli.Pettay)
Attachment #453108 - Flags: review?(Olli.Pettay)
(Reporter)

Updated

8 years ago
Attachment #453846 - Flags: review?(Olli.Pettay) → review+
(Reporter)

Comment 46

8 years ago
I pushed this to m-c
http://hg.mozilla.org/mozilla-central/rev/29e8ddc0229f

Marking fixed after the tests have run.
(Reporter)

Comment 47

8 years ago
So far looking good.
Marking this fixed.

Maybe bug 573227 could be marked fixed once there are some more test runs.
Need to also go through all the other websocket related test failure bugs.
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
(Assignee)

Comment 48

8 years ago
I was thinking... Maybe there are more two cases that WebSocket objects should be kept alive, mainly the first one:
  * When there are not sent outgoing messages;
  * When at least one open or message events has been received, and there are close events listeners (the close event could be flagged as 'strong' in this case);
(Reporter)

Comment 49

8 years ago
I'm not 100% sure those are really important cases.

File a followup bug? And could you write an email to webapps wg mailing list about the garbage collection issues?
(Assignee)

Comment 50

8 years ago
Ok. I will send an email the the list, and depending on the answers I file a followup bug.
Depends on: 621347
Flags: in-testsuite+
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.