We're getting a crash after HTTP operations in Sync tests, following on from our fix in Bug 690170. It looks like this (and see blocked bugs): TEST-PASS | (xpcshell/head.js) | 27 (+ 0) check(s) passed TEST-INFO | (xpcshell/head.js) | 0 check(s) todo WARNING: nsExceptionService ignoring thread destruction after shutdown: file /builds/slave/m-in-osx-dbg/build/xpcom/base/nsExceptionService.cpp, line 197 ###!!! ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330 catch_exception_raise+0x00118AEC [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x001567DC] catch_exception_raise+0x00118E2B [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x00156B1B] XRE_AddStaticComponent+0x0001341B [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D9435B] XRE_AddStaticComponent+0x00010802 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D91742] NS_GetTraceRefcnt_P+0x00001C6E [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D0F48E] start+0x00005AF9 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/xpcshell +0x00006B79] start+0x00000036 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/xpcshell +0x000010B6] ###!!! ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330 <<<<<<< PROCESS-CRASH | /Users/cltbld/talos-slave/test/build/xpcshell/tests/services/sync/tests/unit/test_syncstoragerequest.js | application crashed (minidump found)
With high probability related to bug 684893 that changed the code around the timer and landed just recently.
Theory: this is a race between nsHttpConnectionMgr::Shutdown and nsSocketTransportService::Shutdown, that both get called in that order (connection manager first) while both are called on the main thread, and socket transport state notification on the network thread In detail: - nsHttpConnectionMgr::Shutdown calls Abandone on all half open sockets, what doesn't ensure to drop the last reference on those instance, the transport still keeps ref to the half open socket instance via mCallbacks (we may want to drop it too) - nsSocketTransportService::Shutdown stops the socket polling and disallows any state notifications from sockets - however, in time between those two operation the net socket still runs and may invoke CONNECTING_TO notification from the socket (via callbacks) on an already abandoned half open socket - this will keep another (later the last) ref to the half open socket - Abandone, that cancels and releases the timer will never be called again -> assertion fails when the timer is destroyed (see the call stack) Solutions: - drop all refs to half open socket in Abandon (not really a 100% solution) - do not allow creation of the timer if a half open socket instance that has already been abandoned
(In reply to Honza Bambas (:mayhemer) from comment #2) > - do not allow creation of the timer if a half open socket instance that has > already been abandoned when mEnt == null? makes sense to me.
Created attachment 572638 [details] [diff] [review] v1 [Commit comment 8] As proposed.
There might be more work to do, I just got following crash (w/o the patch applied): xul.dll!nsHttpConnectionMgr::AtActiveConnectionLimit(nsHttpConnectionMgr::nsConnectionEntry * ent=0x00000000, unsigned char caps=')') Line 644 + 0x3 bytes > xul.dll!nsHttpConnectionMgr::nsHalfOpenSocket::Notify(nsITimer * timer=0x0a513d38) Line 1519 + 0x2f bytes xul.dll!nsTimerImpl::Fire() Line 429 xul.dll!nsTimerEvent::Run() Line 523 mEnt is null in nsHalfOpenSocket.
I've seen this somewhat randomly on most Windows 7, a couple of Windows XP and one Linux on the following urls: http://tankionline.com/battle-ru1.html#/battle/000000007fffe07b http://www.freelotto.com/LottoPicker/play.asp?xyz=8605884 http://types.yuzeli.com/survey/gender30/10179009/ http://veiculo.mercadolivre.com.br/MLB-206929353-yamaha-wave-raider-700-98-carr-rodo-2007-docs-ok-_JM http://smotri.com/live/9911223/ http://hn.24h.com.vn/phim/sao-hoa-quy-van-sexy-de-so-c74a413856i62660.html http://webalta.ru/ http://www.wolframalpha.com/input/?i=8069424%26lk=1 http://www.elpais.com/articulo/espana/Alimentar/palomas/calle/1500/euros/multa/elpepunac/20111007elpepinac_19/Tes I never could reproduce locally but if it requires a really fast connection to see the race, then that is perhaps why. I have seen the crash @ nsHttpConnectionMgr::AtActiveConnectionLimit on mostly Windows 7 and some Windows XP with a pseudo stack nsHttpConnectionMgr::AtActiveConnectionLimit(nsHttpConnectionMgr::nsConnectionEntry*, unsigned char) nsHttpConnectionMgr::nsHalfOpenSocket::Notify(nsITimer*) nsTimerImpl::Fire() nsTimerEvent::Run() nsThread::ProcessNextEvent(bool, bool*) at http://types.yuzeli.com/survey/gender30/10179009/ http://www.163.com/ http://jd2008.360buy.com/JdHome/OrderList.aspx http://download.pchome.net/development/c/detail-10255.html http://www.tiantiandy.com/daludianying/20110929/21646.html plus a couple of others. I couldn't reproduce though.
Bob, do you see this only during shutdown or also during normal run? (In reply to Honza Bambas (:mayhemer) from comment #5) > There might be more work to do, I just got following crash (w/o the patch > applied): > mEnt is null in nsHalfOpenSocket. If my theory is correct, then no more changes are needed. We never reach this code while mEnt is null with the r+'ed patch.
(In reply to Honza Bambas (:mayhemer) from comment #7) > Bob, do you see this only during shutdown or also during normal run? shutdown.