TEST-START | chrome://mochitests/content/browser/docshell/test/browser/browser_bug435325.js ++DOCSHELL 0x4abbfe0 == 30 [id = 3080] ++DOMWINDOW == 98 (0x47249e8) [serial = 8031] [outer = (nil)] ++DOMWINDOW == 99 (0x80c42c8) [serial = 8032] [outer = 0x4724970] WARNING: NS_ENSURE_TRUE(mMutable) failed: file /builds/slave/m-in-lnx64-dbg/build/netwerk/base/src/nsSimpleURI.cpp, line 293 ###!!! ABORT: wrong thread: 'PR_GetCurrentThread() == gSocketThread', file /builds/slave/m-in-lnx64-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1987 std::vector<unsigned short, std::allocator<unsigned short> >::resize(unsigned long, unsigned short) (/home/cltbld/talos-slave/test/build/firefox/libxul.so) XRE_AddStaticComponent (/home/cltbld/talos-slave/test/build/firefox/libxul.so) XRE_AddStaticComponent (/home/cltbld/talos-slave/test/build/firefox/libxul.so) XRE_AddStaticComponent (/home/cltbld/talos-slave/test/build/firefox/libxul.so) std::vector<unsigned int, std::allocator<unsigned int> >::resize(unsigned long, unsigned int) (/home/cltbld/talos-slave/test/build/firefox/libxul.so) XRE_AddStaticComponent (/home/cltbld/talos-slave/test/build/firefox/libxul.so) PR_Select (/home/cltbld/talos-slave/test/build/firefox/libnspr4.so) UNKNOWN (/lib64/libpthread.so.0) clone+0x0000006D (/lib64/libc.so.6) ###!!! ABORT: wrong thread: 'PR_GetCurrentThread() == gSocketThread', file /builds/slave/m-in-lnx64-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1987 ++DOMWINDOW == 100 (0x95af7f8) [serial = 8033] [outer = 0x4724970] --DOCSHELL 0x3acc480 == 29 [id = 3057] WARNING: shutting down early because of crash!: file /builds/slave/m-in-lnx64-dbg/build/dom/plugins/ipc/PluginModuleChild.cpp, line 743 WARNING: plugin process _exit()ing: file /builds/slave/m-in-lnx64-dbg/build/dom/plugins/ipc/PluginModuleChild.cpp, line 708 TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/docshell/test/browser/browser_bug435325.js | Exited with code 1 during test run INFO | automation.py | Application ran for: 0:18:44.526552 INFO | automation.py | Reading PID log: /tmp/tmp9pz-ORpidlog ==> process 2409 launched child process 2493 INFO | automation.py | Checking for orphan process with PID: 2493 Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux64-debug/1332518773/firefox-14.0a1.en-US.linux-x86_64.crashreporter-symbols.zip PROCESS-CRASH | chrome://mochitests/content/browser/docshell/test/browser/browser_bug435325.js | application crashed (minidump found) Crash dump filename: /tmp/tmpmjazKd/minidumps/20c714cb-9e3b-2518-6fcdd29b-0ee15e90.dmp Operating system: Linux 0.0.0 Linux 18.104.22.168-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64 CPU: amd64 family 6 model 23 stepping 10 2 CPUs Crash reason: SIGSEGV Crash address: 0x0 Thread 3 (crashed) 0 libmozalloc.so!TouchBadMemory [mozalloc_abort.cpp:93fc7bf5e5be : 68 + 0x0] rbx = 0xd3172d78 r12 = 0x02f46790 r13 = 0xd3172d78 r14 = 0x02f46790 r15 = 0x120d18da rip = 0x1470a0d9 rsp = 0x02f46750 rbp = 0x02f46770 Found by: given as instruction pointer in context 1 libmozalloc.so!mozalloc_abort [mozalloc_abort.cpp:93fc7bf5e5be : 89 + 0x4] rbx = 0xd3172d78 r12 = 0x02f46790 r13 = 0xd3172d78 r14 = 0x02f46790 r15 = 0x120d18da rip = 0x1470a115 rsp = 0x02f46760 rbp = 0x02f46770 Found by: call frame info 2 libxul.so!NS_DebugBreak_P [nsDebugImpl.cpp:93fc7bf5e5be : 388 + 0x4] rbx = 0x02f46b80 r12 = 0x02f46790 r13 = 0xd3172d78 r14 = 0x02f46790 r15 = 0x120d18da rip = 0x11a715ab rsp = 0x02f46780 rbp = 0x02f46bc0 Found by: call frame info 3 libxul.so!nsHttpConnectionMgr::ReadTimeoutTick [nsHttpConnectionMgr.cpp:93fc7bf5e5be : 1987 + 0x32] rbx = 0x00ec3fe0 r12 = 0xf4001920 r13 = 0x120f4618 r14 = 0x00000003 r15 = 0x00000000 rip = 0x10cb0dd5 rsp = 0x02f46bd0 rbp = 0x02f46be0 Found by: call frame info 4 libxul.so!nsHttpConnectionMgr::Observe [nsHttpConnectionMgr.cpp:93fc7bf5e5be : 274 + 0x7] rbx = 0x00ec3fe0 r12 = 0xf4001920 r13 = 0x120f4618 r14 = 0x00000003 r15 = 0x00000000 rip = 0x10cb30bb rsp = 0x02f46bf0 rbp = 0x02f46c20 Found by: call frame info 5 libxul.so!nsTimerImpl::Fire [nsTimerImpl.cpp:93fc7bf5e5be : 516 + 0x15] rbx = 0xf4001920 r12 = 0x00ec3fe0 r13 = 0x00000003 r14 = 0x00000003 r15 = 0x00000000 rip = 0x11a6cb60 rsp = 0x02f46c30 rbp = 0x02f46cd0 Found by: call frame info 6 libxul.so!nsTimerEvent::Run [nsTimerImpl.cpp:93fc7bf5e5be : 591 + 0x10] rbx = 0x00a4fe28 r12 = 0x0000cae2 r13 = 0x00000000 r14 = 0xffffffff r15 = 0x02f46dff rip = 0x11a6ccca rsp = 0x02f46ce0 rbp = 0x02f46d20 Found by: call frame info 7 libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:93fc7bf5e5be : 657 + 0xd] rbx = 0x00c3c1e0 r12 = 0x02f46d00 r13 = 0x00000000 r14 = 0xffffffff r15 = 0x02f46dff rip = 0x11a68c8b rsp = 0x02f46d30 rbp = 0x02f46de0 Found by: call frame info 8 libxul.so!NS_ProcessPendingEvents_P [nsThreadUtils.cpp:93fc7bf5e5be : 195 + 0xd] rbx = 0x00c3c1e0 r12 = 0x02f46dff r13 = 0x4091a679 r14 = 0xffffffff r15 = 0x00000003 rip = 0x11a272c2 rsp = 0x02f46df0 rbp = 0x02f46e30 Found by: call frame info 9 libxul.so!nsThread::ThreadFunc [nsThread.cpp:93fc7bf5e5be : 308 + 0xa] rbx = 0x02f46e40 r12 = 0x00c3c1e0 r13 = 0x00c3c200 r14 = 0x02f46e50 r15 = 0x00000003 rip = 0x11a67e9f rsp = 0x02f46e40 rbp = 0x02f46ea0
Forgot to paste the log link: https://tbpl.mozilla.org/php/getParsedLog.php?id=10320105&tree=Mozilla-Inbound Rev3 Fedora 12x64 mozilla-inbound debug test mochitest-other on 2012-03-23 09:47:31 PDT for push 93fc7bf5e5be
In content/events/test/test_bug336682_1.html: https://tbpl.mozilla.org/php/getParsedLog.php?id=10343028&tree=Mozilla-Inbound
honza, at first glance this is really baffling.. I'm wondering if you have any ideas? it is asserting that readtimeouttick is not being called on the socket thread.. but the timer is created on the socket thread and only ever init'd on the socket thread.. (those asserts don't go off, so we can be sure that happened ok). indeed the stack trace is some other unknown thread - not the socket or main thread. the test doesn't look like its trying to shut down or anything like that.
Just to update, as I've had an off-zilla inquiry. This is on my radar for next week. I'll also say when I first saw this signature I expected it to be randomly distributed across the testbase and its pretty interesting (and promising for a fix) that it is normally associated with a couple particular tests.
I thought I had this figured out today, but try disagrees with me :) However that failed run does have lots of debug added to it which I can use to keep working. I'm only making this comment so people know I'm not treating this as ok-perma-orange
through some debugging annotations and many try runs of mochi-other on linux-64 I have learned that at the time of the assert the socket thread is complete but the connection manager has not yet shutdown (and has not yet received the begin-shutdown notification) which leaves the tick is still active
Created attachment 613409 [details] [diff] [review] patch 0 its possible for the socket transport service to shutdown before the connection manager. It actually does this all the time if you go offline. In that case the readtimeouttick continued to try to tick until it determined that there were no active connections, when it disabled itself. but that was too late to do safely if the socket thread was already gone (as the number of active connections is designed to only be accessed on the socket thread). The patch changes things to disable the timer tick any time the active connection count drops to zero instead of cleaning it up out of the tick itself. In the case with a long running active connection when the transport service goes away first (the tick is still removed by the connection manager shutdown routine if it goes away first) the transport actually detaches all active sockets which drives the count to 0 and disables the tick.
Comment on attachment 613409 [details] [diff] [review] patch 0 Review of attachment 613409 [details] [diff] [review]: ----------------------------------------------------------------- r=honzab with a comment. ::: netwerk/protocol/http/nsHttpConnectionMgr.cpp @@ +254,5 @@ > } > } > > +bool > +nsHttpConnectionMgr::ConditionallyStopReadTimeoutTick() This can be just void. ConditionallyStopPruneDeadConnectionsTimer is the same. It's confusing when a method does the thing but also returns a flag whether conditions to do the thing has been met (like the caller should do thing it self). You are not using the result anywhere.
https://hg.mozilla.org/integration/mozilla-inbound/rev/69e021051aa5 > > +bool > > +nsHttpConnectionMgr::ConditionallyStopReadTimeoutTick() > > This can be just void. I made it void.
Comment on attachment 613409 [details] [diff] [review] patch 0 This is one of a two patch series (bug 744788 and bug 738812) that should be accepted or declined for aurora atomically. We've only seen this orange on FF14, but it can happen on FF13. The reason it is more prominent on 14 is the timer event is much more frequent there due to the pipelining changes introduced on 14, but the same timer is used by spdy on 13. [Approval Request Comment] Regression caused by (bug #): 728113 User impact if declined: potential crash risk during offline or shutdown Testing completed (on m-c, etc.): m-c - orange cleared up. Risk to taking this patch (and alternatives if risky): fairly low. String changes made by this patch: none.
Comment on attachment 613409 [details] [diff] [review] patch 0 [Triage comment] Low risk, clears up what sounds like intermittent orange on 13, please go ahead with landing.