Closed Bug 881413 Opened 7 years ago Closed 6 years ago

I10t test_errorhandler.js,test_TelemetryPing.js,test_update.js,test_healthreporter.js | test failed (with xpcshell return code: -2147483645) | application crashed [@ TimerThread::Run()] (Assertion failure: rc != 0 (destroyed timer off its target thread!))

Categories

(Core :: Networking: HTTP, defect)

x86
Windows 8
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox28 --- wontfix
firefox29 --- fixed
firefox30 --- fixed
firefox-esr24 --- wontfix
b2g-v1.2 --- wontfix
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: RyanVM, Assigned: benjamin)

References

Details

(Keywords: assertion, crash)

Crash Data

Attachments

(3 files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=23958467&tree=Mozilla-Inbound

WINNT 6.2 mozilla-inbound debug test xpcshell on 2013-06-10 04:39:00 PDT for push eaae3921357b
slave: t-w864-ix-015

05:48:44     INFO -  TEST-PASS | (xpcshell/head.js) | 212 (+ 0) check(s) passed
05:48:44     INFO -  TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
05:48:44     INFO -  WARNING: An event was posted to a thread that will never run it (rejected): file e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/xpcom/threads/nsThread.cpp, line 367
05:48:44     INFO -  Assertion failure: rc != 0 (destroyed timer off its target thread!), at e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/xpcom/threads/TimerThread.cpp:247
05:48:44     INFO -  WARNING: NS_ENSURE_TRUE(mThread != PR_GetCurrentThread()) failed: file e:/builds/moz2_slave/m-in-w32-d-0000000000000000000/build/xpcom/threads/nsThread.cpp, line 445
05:48:44     INFO -  <<<<<<<
05:48:50  WARNING -  PROCESS-CRASH | C:\slave\test\build\tests\xpcshell\tests\services\sync\tests\unit\test_errorhandler.js | application crashed [@ TimerThread::Run()]
05:48:50     INFO -  Crash dump filename: C:\slave\test\build\tests\xpcshell\tests\services\sync\tests\unit\e86e63e2-e82f-43bf-a519-61fe7435413c.dmp
05:48:50     INFO -  Operating system: Windows NT
05:48:50     INFO -                    6.2.9200
05:48:50     INFO -  CPU: x86
05:48:50     INFO -       GenuineIntel family 6 model 30 stepping 5
05:48:50     INFO -       8 CPUs
05:48:50     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
05:48:50     INFO -  Crash address: 0x737c9e31
05:48:50     INFO -  Thread 7 (crashed)
05:48:50     INFO -   0  xul.dll!TimerThread::Run() [TimerThread.cpp:eaae3921357b : 247 + 0x1b]
05:48:50     INFO -      eip = 0x737c9e31   esp = 0x04fff93c   ebp = 0x04fff9ac   ebx = 0x05ca92e0
05:48:50     INFO -      esi = 0x05ca9328   edi = 0x03297608   eax = 0x00000000   ecx = 0x21a37d50
05:48:50     INFO -      edx = 0x71b1e4d8   efl = 0x00000206
05:48:50     INFO -      Found by: given as instruction pointer in context
05:48:50     INFO -   1  xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:eaae3921357b : 626 + 0xd]
05:48:50     INFO -      eip = 0x737cccc0   esp = 0x04fff9b4   ebp = 0x04fffa04
05:48:50     INFO -      Found by: call frame info
05:48:50     INFO -   2  xul.dll!NS_ProcessNextEvent(nsIThread *,bool) [nsThreadUtils.cpp:eaae3921357b : 238 + 0xc]
05:48:50     INFO -      eip = 0x7378768f   esp = 0x04fffa0c   ebp = 0x04fffa18
05:48:50     INFO -      Found by: call frame info
05:48:50     INFO -   3  xul.dll!nsThread::ThreadFunc(void *) [nsThread.cpp:eaae3921357b : 264 + 0x7]
05:48:50     INFO -      eip = 0x737cc443   esp = 0x04fffa20   ebp = 0x04fffa40
05:48:50     INFO -      Found by: call frame info
Crash Signature: [@ TimerThread::Run() ]
https://tbpl.mozilla.org/php/getParsedLog.php?id=26486056&tree=Mozilla-Inbound
Summary: Intermittent test_errorhandler.js | test failed (with xpcshell return code: -2147483645) | application crashed [@ TimerThread::Run()] (Assertion failure: rc != 0 (destroyed timer off its target thread!)) → Intermittent test_errorhandler.js,test_TelemetryPing.js | test failed (with xpcshell return code: -2147483645) | application crashed [@ TimerThread::Run()] (Assertion failure: rc != 0 (destroyed timer off its target thread!))
https://tbpl.mozilla.org/php/getParsedLog.php?id=29521615&tree=Fx-Team
Summary: Intermittent test_errorhandler.js,test_TelemetryPing.js | test failed (with xpcshell return code: -2147483645) | application crashed [@ TimerThread::Run()] (Assertion failure: rc != 0 (destroyed timer off its target thread!)) → Intermittent test_errorhandler.js,test_TelemetryPing.js,test_update.js | test failed (with xpcshell return code: -2147483645) | application crashed [@ TimerThread::Run()] (Assertion failure: rc != 0 (destroyed timer off its target thread!))
https://tbpl.mozilla.org/php/getParsedLog.php?id=30535891&tree=B2g-Inbound
Summary: Intermittent test_errorhandler.js,test_TelemetryPing.js,test_update.js | test failed (with xpcshell return code: -2147483645) | application crashed [@ TimerThread::Run()] (Assertion failure: rc != 0 (destroyed timer off its target thread!)) → I10t test_errorhandler.js,test_TelemetryPing.js,test_update.js,test_healthreporter.js | test failed (with xpcshell return code: -2147483645) | application crashed [@ TimerThread::Run()] (Assertion failure: rc != 0 (destroyed timer off its target thread!))
Do you have any ideas on somebody who could look at this?  The assertion sounds pretty sketchy, and it is happening fairly frequently.  Thanks.
Flags: needinfo?(benjamin)
Depends on: 963474
No longer depends on: 963474
FWIW, if push comes to shove, we can probably disable our way to "victory" here by skipping test_update.js on Windows debug. At the rate things are going, that's looking more and more likely :(
This is complicated to debug, of course.

13:06:52     INFO -  [3540] WARNING: cannot post event if not initialized: file c:/builds/moz2_slave/m-cen-w32-d-000000000000000000/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 178
13:06:52     INFO -  TEST-PASS | (xpcshell/head.js) | 230 (+ 0) check(s) passed
13:06:52     INFO -  TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
13:06:52     INFO -  [3540] WARNING: An event was posted to a thread that will never run it (rejected): file c:\\builds\\moz2_slave\\m-cen-w32-d-000000000000000000\\build\\xpcom\\threads\\nsThread.cpp, line 367
13:06:52     INFO -  Assertion failure: rc != 0 (destroyed timer off its target thread!), at c:\\builds\\moz2_slave\\m-cen-w32-d-000000000000000000\\build\\xpcom\\threads\\TimerThread.cpp:259

We pretty clearly are trying to fire an XPCOM timer at a thread which is already shutting down. This causes the timer service to hold the last reference to the timer and eventually assert. The following things initialize timers during this test:

startupcache

mozilla::dom::workers::WorkerPrivate::SetGCTimerMode

nsXMLHttpRequest::StartProgressEventTimer (from main thread)

mozilla::net::nsHttpConnectionMgr::ActivateTimeoutTick (socket transport thread)

mozilla::net::nsHttpConnectionMgr::nsHalfOpenSocket::SetupBackupTimer (socket transport thread)

JS timer from 0 namedTimer(callback = [function], wait = 300000000, thisObj = [object Object], name = "timeoutTimer") ["resource://gre/modules/services-common/utils.js":172]
    this = [object Object]
1 delayTimeout() ["resource://services-common/rest.js":357]
    this = [object Object]
2 dispatch(method = "POST", data = "...", onComplete = undefined, onProgress = undefined) ["resource://services-common/rest.js":347]
    this = [object Object]
3 post(data = "...") ["resource://services-common/rest.js":242]
    this = [object Object]
4 uploadJSON(namespace = "test00", id = "e22a62af-f2bb-4c1d-87f6-471aca8b155a", payload = "{"version":2,"thisPingDate":"2014-02-19","geckoAppInfo":{"_v":1,"vendor":"Mozilla","name":"xpcshell","id":"xpcshell@tests.mozilla.org","version":"1","appBuildID":"20121107","platformVersion":"p-ver","platformBuildID":"20121106","os":"XPCShell","xpcomabi":"noarch-spidermonkey","updateChannel":"default"},"data":{"last":{},"days":{}}}", options = [object Object]) ["resource://services-common/bagheeraclient.js":200]
    this = [object Object]
5 doUpload() ["resource://gre/modules/services/healthreport/healthreporter.jsm":1427]
    this = [object Object]
6 TaskImpl_run(aSendResolved = true, aSendValue = 111) ["resource://gre/modules/Task.jsm":233]
    this = [object Object]
7 anonymous() ["resource://gre/modules/Promise.jsm":774]
    this = [object Object]
8 anonymous() ["resource://gre/modules/Promise.jsm":653]
    this = [object Object]

mozilla::net::nsHttpConnection::StartShortLivedTCPKeepalives (non-main thread but doesn't appear to be the socket transport thread)

nsDiskCacheMap::ResetCacheTimer

nsCacheService::CreateDiskDevice

mozilla::net::nsHttpConnectionMgr::PruneDeadConnectionsAfter

Since the JS timers all fire at the main thread, I don't think it's possible for them to be the problem here. But I don't know which of the remaining timer/threads is the one at fault. I'd love to add more debugging code to the "destroyed timer off its target thread" warning, but even though I have a handle to the nsThreadImpl in question, it doesn't have a name or other identifying data which would make it easy to tell which thread it is. The nsHttpConnectionMgr.cpp warning is interesting and points to this being one of the networking timers...
Flags: needinfo?(benjamin)
Maybe Jason can help answer that.
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(jduell.mcbugs)
Benjamin, thanks for the very nice forensic work.

Looking at the hg log for /netwerk from ~ June 10 2013 (when this started), I don't see an obvious culprit.  I suppose it's possible that the patch for bug 845190 (that blocked re-initializing nsStreamTransportService during shutdown--it's marked obsolete but it landed on m-i then) might somehow have exposed some latent bug with timers here.

HTTP-off-main-thread landed around then, but slightly later (Jun 13th). The TCPKeepAlive stuff is very recent, so that's not it.

From the stuff in the list in comment 284, that leaves a bunch of Http cxn manager stuff, and some non-necko culprits.  I'm tossing to Patrick in case he can think of any possible timers that might be to blame here, and cc-ing some other folks who might have touched code that does off-main timer stuff.
Flags: needinfo?(mcmanus)