Last Comment Bug 700402 - ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330
: ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: x86_64 Mac OS X
: -- normal (vote)
: mozilla10
Assigned To: Honza Bambas (:mayhemer)
:
Mentors:
Depends on:
Blocks: 532972 684893 700095 700396
  Show dependency treegraph
 
Reported: 2011-11-07 11:57 PST by Richard Newman [:rnewman]
Modified: 2011-11-08 07:06 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
v1 [Commit comment 8] (2.10 KB, patch)
2011-11-07 15:12 PST, Honza Bambas (:mayhemer)
mcmanus: review+
honzab.moz: checkin+
Details | Diff | Splinter Review

Description Richard Newman [:rnewman] 2011-11-07 11:57:02 PST
We're getting a crash after HTTP operations in Sync tests, following on from our fix in Bug 690170.

It looks like this (and see blocked bugs):

TEST-PASS | (xpcshell/head.js) | 27 (+ 0) check(s) passed

TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
WARNING: nsExceptionService ignoring thread destruction after shutdown: file /builds/slave/m-in-osx-dbg/build/xpcom/base/nsExceptionService.cpp, line 197
###!!! ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330
catch_exception_raise+0x00118AEC [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x001567DC]
catch_exception_raise+0x00118E2B [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x00156B1B]
XRE_AddStaticComponent+0x0001341B [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D9435B]
XRE_AddStaticComponent+0x00010802 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D91742]
NS_GetTraceRefcnt_P+0x00001C6E [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D0F48E]
start+0x00005AF9 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/xpcshell +0x00006B79]
start+0x00000036 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/xpcshell +0x000010B6]
###!!! ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330
<<<<<<<
PROCESS-CRASH | /Users/cltbld/talos-slave/test/build/xpcshell/tests/services/sync/tests/unit/test_syncstoragerequest.js | application crashed (minidump found)
Comment 1 Honza Bambas (:mayhemer) 2011-11-07 12:05:56 PST
With high probability related to bug 684893 that changed the code around the timer and landed just recently.
Comment 2 Honza Bambas (:mayhemer) 2011-11-07 13:27:26 PST
Theory: 

this is a race between nsHttpConnectionMgr::Shutdown and nsSocketTransportService::Shutdown, that both get called in that order (connection manager first) while both are called on the main thread, and socket transport state notification on the network thread

In detail:

- nsHttpConnectionMgr::Shutdown calls Abandone on all half open sockets, what doesn't ensure to drop the last reference on those instance, the transport still keeps ref to the half open socket instance via mCallbacks (we may want to drop it too)
- nsSocketTransportService::Shutdown stops the socket polling and disallows any state notifications from sockets
- however, in time between those two operation the net socket still runs and may invoke CONNECTING_TO notification from the socket (via callbacks) on an already abandoned half open socket
- this will keep another (later the last) ref to the half open socket
- Abandone, that cancels and releases the timer will never be called again

-> assertion fails when the timer is destroyed (see the call stack)

Solutions:

- drop all refs to half open socket in Abandon (not really a 100% solution)
- do not allow creation of the timer if a half open socket instance that has already been abandoned
Comment 3 Patrick McManus [:mcmanus] PTO until Sep 6 2011-11-07 13:39:46 PST
(In reply to Honza Bambas (:mayhemer) from comment #2)

> - do not allow creation of the timer if a half open socket instance that has
> already been abandoned

when mEnt == null?

makes sense to me.
Comment 4 Honza Bambas (:mayhemer) 2011-11-07 15:12:48 PST
Created attachment 572638 [details] [diff] [review]
v1 [Commit comment 8]

As proposed.
Comment 5 Honza Bambas (:mayhemer) 2011-11-07 18:04:08 PST
There might be more work to do, I just got following crash (w/o the patch applied):

 	xul.dll!nsHttpConnectionMgr::AtActiveConnectionLimit(nsHttpConnectionMgr::nsConnectionEntry * ent=0x00000000, unsigned char caps=')')  Line 644 + 0x3 bytes	
>	xul.dll!nsHttpConnectionMgr::nsHalfOpenSocket::Notify(nsITimer * timer=0x0a513d38)  Line 1519 + 0x2f bytes	
 	xul.dll!nsTimerImpl::Fire()  Line 429	
 	xul.dll!nsTimerEvent::Run()  Line 523	

mEnt is null in nsHalfOpenSocket.
Comment 6 Bob Clary [:bc:] 2011-11-07 22:29:39 PST
I've seen this somewhat randomly on most Windows 7, a couple of Windows XP and one Linux on the following urls:

http://tankionline.com/battle-ru1.html#/battle/000000007fffe07b
http://www.freelotto.com/LottoPicker/play.asp?xyz=8605884
http://types.yuzeli.com/survey/gender30/10179009/
http://veiculo.mercadolivre.com.br/MLB-206929353-yamaha-wave-raider-700-98-carr-rodo-2007-docs-ok-_JM
http://smotri.com/live/9911223/
http://hn.24h.com.vn/phim/sao-hoa-quy-van-sexy-de-so-c74a413856i62660.html
http://webalta.ru/
http://www.wolframalpha.com/input/?i=8069424%26lk=1
http://www.elpais.com/articulo/espana/Alimentar/palomas/calle/1500/euros/multa/elpepunac/20111007elpepinac_19/Tes

I never could reproduce locally but if it requires a really fast connection to see the race, then that is perhaps why. I have seen the crash @ nsHttpConnectionMgr::AtActiveConnectionLimit on mostly Windows 7 and some Windows XP with a pseudo stack

nsHttpConnectionMgr::AtActiveConnectionLimit(nsHttpConnectionMgr::nsConnectionEntry*, unsigned char) nsHttpConnectionMgr::nsHalfOpenSocket::Notify(nsITimer*) nsTimerImpl::Fire() nsTimerEvent::Run() nsThread::ProcessNextEvent(bool, bool*)

at 

http://types.yuzeli.com/survey/gender30/10179009/
http://www.163.com/
http://jd2008.360buy.com/JdHome/OrderList.aspx
http://download.pchome.net/development/c/detail-10255.html
http://www.tiantiandy.com/daludianying/20110929/21646.html

plus a couple of others. I couldn't reproduce though.
Comment 7 Honza Bambas (:mayhemer) 2011-11-08 02:46:11 PST
Bob, do you see this only during shutdown or also during normal run?

(In reply to Honza Bambas (:mayhemer) from comment #5)
> There might be more work to do, I just got following crash (w/o the patch
> applied):
> mEnt is null in nsHalfOpenSocket.

If my theory is correct, then no more changes are needed.  We never reach this code while mEnt is null with the r+'ed patch.
Comment 9 Bob Clary [:bc:] 2011-11-08 03:49:18 PST
(In reply to Honza Bambas (:mayhemer) from comment #7)
> Bob, do you see this only during shutdown or also during normal run?

shutdown.
Comment 10 Ed Morley [:emorley] 2011-11-08 07:06:36 PST
https://hg.mozilla.org/mozilla-central/rev/59cf0c12f975

Note You need to log in before you can comment on or make changes to this bug.