Closed Bug 700402 Opened 13 years ago Closed 13 years ago

ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330

Categories

(Core :: Networking: HTTP, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: rnewman, Assigned: mayhemer)

References

Details

Attachments

(1 file)

We're getting a crash after HTTP operations in Sync tests, following on from our fix in Bug 690170.

It looks like this (and see blocked bugs):

TEST-PASS | (xpcshell/head.js) | 27 (+ 0) check(s) passed

TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
WARNING: nsExceptionService ignoring thread destruction after shutdown: file /builds/slave/m-in-osx-dbg/build/xpcom/base/nsExceptionService.cpp, line 197
###!!! ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330
catch_exception_raise+0x00118AEC [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x001567DC]
catch_exception_raise+0x00118E2B [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x00156B1B]
XRE_AddStaticComponent+0x0001341B [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D9435B]
XRE_AddStaticComponent+0x00010802 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D91742]
NS_GetTraceRefcnt_P+0x00001C6E [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/XUL +0x01D0F48E]
start+0x00005AF9 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/xpcshell +0x00006B79]
start+0x00000036 [/Users/cltbld/talos-slave/test/build/NightlyDebug.app/Contents/MacOS/xpcshell +0x000010B6]
###!!! ABORT: syntimer not null: '!mSynTimer', file /builds/slave/m-in-osx-dbg/build/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 1330
<<<<<<<
PROCESS-CRASH | /Users/cltbld/talos-slave/test/build/xpcshell/tests/services/sync/tests/unit/test_syncstoragerequest.js | application crashed (minidump found)
With high probability related to bug 684893 that changed the code around the timer and landed just recently.
Blocks: 684893
Theory: 

this is a race between nsHttpConnectionMgr::Shutdown and nsSocketTransportService::Shutdown, that both get called in that order (connection manager first) while both are called on the main thread, and socket transport state notification on the network thread

In detail:

- nsHttpConnectionMgr::Shutdown calls Abandone on all half open sockets, what doesn't ensure to drop the last reference on those instance, the transport still keeps ref to the half open socket instance via mCallbacks (we may want to drop it too)
- nsSocketTransportService::Shutdown stops the socket polling and disallows any state notifications from sockets
- however, in time between those two operation the net socket still runs and may invoke CONNECTING_TO notification from the socket (via callbacks) on an already abandoned half open socket
- this will keep another (later the last) ref to the half open socket
- Abandone, that cancels and releases the timer will never be called again

-> assertion fails when the timer is destroyed (see the call stack)

Solutions:

- drop all refs to half open socket in Abandon (not really a 100% solution)
- do not allow creation of the timer if a half open socket instance that has already been abandoned
(In reply to Honza Bambas (:mayhemer) from comment #2)

> - do not allow creation of the timer if a half open socket instance that has
> already been abandoned

when mEnt == null?

makes sense to me.
As proposed.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attachment #572638 - Flags: review?(mcmanus)
Attachment #572638 - Flags: review?(mcmanus) → review+
There might be more work to do, I just got following crash (w/o the patch applied):

 	xul.dll!nsHttpConnectionMgr::AtActiveConnectionLimit(nsHttpConnectionMgr::nsConnectionEntry * ent=0x00000000, unsigned char caps=')')  Line 644 + 0x3 bytes	
>	xul.dll!nsHttpConnectionMgr::nsHalfOpenSocket::Notify(nsITimer * timer=0x0a513d38)  Line 1519 + 0x2f bytes	
 	xul.dll!nsTimerImpl::Fire()  Line 429	
 	xul.dll!nsTimerEvent::Run()  Line 523	

mEnt is null in nsHalfOpenSocket.
I've seen this somewhat randomly on most Windows 7, a couple of Windows XP and one Linux on the following urls:

http://tankionline.com/battle-ru1.html#/battle/000000007fffe07b
http://www.freelotto.com/LottoPicker/play.asp?xyz=8605884
http://types.yuzeli.com/survey/gender30/10179009/
http://veiculo.mercadolivre.com.br/MLB-206929353-yamaha-wave-raider-700-98-carr-rodo-2007-docs-ok-_JM
http://smotri.com/live/9911223/
http://hn.24h.com.vn/phim/sao-hoa-quy-van-sexy-de-so-c74a413856i62660.html
http://webalta.ru/
http://www.wolframalpha.com/input/?i=8069424%26lk=1
http://www.elpais.com/articulo/espana/Alimentar/palomas/calle/1500/euros/multa/elpepunac/20111007elpepinac_19/Tes

I never could reproduce locally but if it requires a really fast connection to see the race, then that is perhaps why. I have seen the crash @ nsHttpConnectionMgr::AtActiveConnectionLimit on mostly Windows 7 and some Windows XP with a pseudo stack

nsHttpConnectionMgr::AtActiveConnectionLimit(nsHttpConnectionMgr::nsConnectionEntry*, unsigned char) nsHttpConnectionMgr::nsHalfOpenSocket::Notify(nsITimer*) nsTimerImpl::Fire() nsTimerEvent::Run() nsThread::ProcessNextEvent(bool, bool*)

at 

http://types.yuzeli.com/survey/gender30/10179009/
http://www.163.com/
http://jd2008.360buy.com/JdHome/OrderList.aspx
http://download.pchome.net/development/c/detail-10255.html
http://www.tiantiandy.com/daludianying/20110929/21646.html

plus a couple of others. I couldn't reproduce though.
Bob, do you see this only during shutdown or also during normal run?

(In reply to Honza Bambas (:mayhemer) from comment #5)
> There might be more work to do, I just got following crash (w/o the patch
> applied):
> mEnt is null in nsHalfOpenSocket.

If my theory is correct, then no more changes are needed.  We never reach this code while mEnt is null with the r+'ed patch.
Comment on attachment 572638 [details] [diff] [review]
v1 [Commit comment 8]

https://hg.mozilla.org/integration/mozilla-inbound/rev/59cf0c12f975
Attachment #572638 - Attachment description: v1 → v1 [Commit comment 8]
Attachment #572638 - Flags: checkin+
(In reply to Honza Bambas (:mayhemer) from comment #7)
> Bob, do you see this only during shutdown or also during normal run?

shutdown.
https://hg.mozilla.org/mozilla-central/rev/59cf0c12f975
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla10
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: