Came in on a TM -> JM merge, needs to go out before we can merge TM -> m-c. The assertion stacks in the Windows logs are clearly crap; if the Mac ones are, it's not so obvious. http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1310004908.1310008532.27157.gz 39252 INFO TEST-PASS | /tests/content/base/test/test_websocket_basic.html | Wrong message - "test message100" should equal "test message100" 39253 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket_basic.html | Test timed out. 39254 INFO TEST-PASS | /tests/content/base/test/test_websocket_basic.html | Didn't receive all the messages! - 100 should equal 100 39255 INFO TEST-PASS | /tests/content/base/test/test_websocket_basic.html | Connection closed cleanly 39256 INFO TEST-END | /tests/content/base/test/test_websocket_basic.html | finished in 300987ms 39257 INFO TEST-START | /tests/content/base/test/test_websocket_hello.html 39258 INFO TEST-PASS | /tests/content/base/test/test_websocket_hello.html | 'this' should point to the WebSocket. (2) - [object WebSocket @ 0x27acca60 (native @ 0x29f67110)] should equal [object WebSocket @ 0x27acca60 (native @ 0x29f67110)] 39259 ERROR TEST-UNEXPECTED-FAIL | /tests/content/base/test/test_websocket_hello.html | Connection closed cleanly WARNING: NS_ENSURE_TRUE(mState == STATE_TRANSFERRING) failed: file /builds/slave/tm-osx-dbg/build/netwerk/base/src/nsSocketTransport2.cpp, line 1889 before 991232, after 987136, break 07b34000 ++DOMWINDOW == 297 (0x257040c8) [serial = 827] [outer = 0x1f6176b0] 39260 INFO TEST-PASS | /tests/content/base/test/test_websocket_hello.html | Wrong data - "Hello world!" should equal "Hello world!" ###!!! ASSERTION: nsLoadGroup not thread-safe: '_mOwningThread.GetThread() == PR_GetCurrentThread()', file /builds/slave/tm-osx-dbg/build/netwerk/base/src/nsLoadGroup.cpp, line 213 nsLoadGroup::Internal::Release [netwerk/base/src/nsLoadGroup.cpp:213] nsLoadGroup::Release [netwerk/base/src/nsLoadGroup.cpp:213] nsCOMPtr<nsILoadGroup>::~nsCOMPtr [nsCOMPtr.h:533] mozilla::net::BaseWebSocketChannel::~BaseWebSocketChannel [netwerk/protocol/websocket/BaseWebSocketChannel.h:57] mozilla::net::WebSocketChannel::~WebSocketChannel [netwerk/protocol/websocket/WebSocketChannel.cpp:571] mozilla::net::WebSocketChannel::Release [netwerk/protocol/websocket/WebSocketChannel.cpp:81] nsCOMPtr<nsIInputStreamCallback>::assign_assuming_AddRef [nsCOMPtr.h:518] nsCOMPtr<nsIInputStreamCallback>::assign_with_AddRef [nsCOMPtr.h:1204] nsCOMPtr<nsIInputStreamCallback>::operator= [nsCOMPtr.h:664] nsInputStreamReadyEvent::Run [xpcom/io/nsStreamUtils.cpp:117] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:617] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:245] nsSocketTransportService::Run [netwerk/base/src/nsSocketTransportService2.cpp:639] nsThread::ProcessNextEvent [xpcom/threads/nsThread.cpp:617] NS_ProcessNextEvent_P [obj-firefox/xpcom/build/nsThreadUtils.cpp:245] nsThread::ThreadFunc [xpcom/threads/nsThread.cpp:271] _pt_root [nsprpub/pr/src/pthreads/ptthread.c:190] libSystem.B.dylib + 0x32155 libSystem.B.dylib + 0x32012 39261 INFO TEST-END | /tests/content/base/test/test_websocket_hello.html | finished in 2135ms
it appears WebSocketChannel::mLoadGroup need to be proxy released on the main thread. I'll want to checkout tracemonkey and run the patch through (at least os x) try though.. so it will take some time.
the real cause is a regression from 537787, mLoadGroup was supposed to be moved to a base class but was instead duplicated in both the base and derived classes. It should also be proxy released as I mentioned in comment 6, but that's apparently only relevant on a corner case that isn't being tested. I'll attach a patch in a minute.
Created attachment 544505 [details] [diff] [review] patch v1
Comment on attachment 544505 [details] [diff] [review] patch v1 Review of attachment 544505 [details] [diff] [review]: -----------------------------------------------------------------
Comment on attachment 544505 [details] [diff] [review] patch v1 The e10s patch landed in aurora, and this fixes a relatively important yet simple bug in it.
crud - this patch didn't fix the orange with try. (it did clear up all those assertions, which are legit problems so I'll push the patch.)
http://email@example.com/try-macosx-debug/try_leopard-o-debug_test-mochitests-1-build414.txt.gz JS team - the portion of the test that is failing deals with a very large (225KB) message. Are there any diffs between TM and M-C that might be relevant to that?
Can you attach the message or link to it? TM recently pulled the TypeInference project (bug 608741) from the JM repo, which has a large effect on how JS is processed.
I was thinking it was the large message in test part 4 (with the huge message, which is both sent and recvd through ws) of this test: http://mxr.mozilla.org/mozilla-central/source/content/base/test /test_websocket_basic.html?force=1#22 But it actually looks like part 3 of the same test didn't quite complete.. it moved all 100 of its messages but doesn't move onto the close until a timeout kicks in. Note all those calls to forcegc(). [the test is just to an echo server, so each time a message is sent one is recvd]. hth.
I've moved the ASSERTION(loadgroup not threadsafe) patch to a different bug 669975 as it is just a distraction from the TM issue.
I've tried to repro this on both OSX debug and Win7 debug and they both work fine.
Have to wonder if this test is doing several hundred GCs and just hitting a timeout before it finishes. Will reduce the testCount and tryserver that.
Reduced the testCount to 50 on tryserver and this test was green on OSX, Win7 and WinXP. I don't think it gets stuck, it just takes longer and usually hits the timeout.
Well, if it hits the timeout when testCount isn't modified, there is probably some significant performance regression which should be fixed.
How do we determine that? It seems like this test has been the cause of random timeout orange in the past, and it may have been close to the timeout threshold already. There are GC performance regressions we're working on (bug 669815), but if fixing those does not fix this then I'd say the problem is with the test and/or timeout value.
(In reply to comment #26) > How do we determine that? It seems like this test has been the cause of > random timeout orange in the past, and it may have been close to the timeout > threshold already. I'm not a fan of the tests, which certainly rely too much on timeouts and assumptions about processing speeds. but afaik it has not had an appreciable level of timeout flakiness in the last few months at all. perhaps a measurement of time to completion before and after the merge would be a useful data point.
This hasn't happened on any platform in the JM branch since the patch for bug 670185 went in (which was, in large part, cutting out work done during GC). How should this get marked as resolved?
JM's the only place it was broken, so if it's fixed there it's FIXED.