Closed Bug 850252 Opened 7 years ago Closed 7 years ago

Remove Off-Main-Thread XPCWrappedJS refcounting from nsSocketTransport2

Categories

(Core :: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: bholley, Assigned: jdm)

References

Details

Attachments

(1 file, 1 obsolete file)

No description provided.
Depends on: 850253
No longer depends on: 850253
Attachment #723995 - Flags: review?(mcmanus) → review+
Ahah! So this patch appears to be the source of those windows-only crashes I was seeing on try earlier. And there are stacks this time to boot! Life is good.

So, this patch appears to be causing crashes like this one:

https://tbpl.mozilla.org/php/getParsedLog.php?id=20833852&tree=Try&full=1#error1

This is interesting, because the patch does very little. My theory is that this patch causes the destructor of the object in mCallbacks to execute later than it did previously (since we'll do a deferred release), which is causing something to go wrong.

The thing is, this _smells_ like the exact opposite problem. We're spinning the event loop, which means that event-loop based release gets called sooner than callers think it does, and sometimes before the given stack frame returns. But it seems like this patch won't cause us to execute destructors any earlier, only later.

mcmanus, what do you think?
Flags: needinfo?(mcmanus)
I think jdm and honza (or jduell, I forget) were just dealing with the lifetime of that damn opaque callback .. hopefully they'll have some firmer insight because I'm with you - later sounds safer.
Flags: needinfo?(mcmanus)
Flags: needinfo?(jduell.mcbugs)
jduell also suggests asking jdm and Honza.
Flags: needinfo?(josh)
Flags: needinfo?(jduell.mcbugs)
Flags: needinfo?(honzab.moz)
No idea right now from the top of my head.  I may try to reproduce locally.  I'm sad we are that fragile..
Flags: needinfo?(honzab.moz)
Arg, try was reset, so we lost the oranges. Repushing for reference:

https://tbpl.mozilla.org/?tree=Try&rev=9901fdb43173
I _think_ I've managed to reproduce this locally in a debug build on my win7 VM. It looks like it depends on a series of tests being run. The most reliable way I've managed to trigger it is with:

mach mochitest-plain layout/

Can one of you guys please take a look here? This is the last thing that blocks dethreading all of XPConnect, which is something we've been working on for over a year.

If everyone's stumped or busy, the alternative would be to try to hack things so that we only use the nsMainThreadPtrHandle if the object QIs to an XPCWrappedJS. It would be ugly and wouldn't get to the root of the problem, but it might work...
I had assumed that you were going to dig in based on our conversation about that nsHttpConnectionMgr thing that passes itself; this made sense to me since you were reproducing it. If you'd like me to instead, that's fine, I just need to know that.
Flags: needinfo?(josh)
thanks josh!
Flags: needinfo?(josh)
Sweet, I can reproduce this on Linux under gdb by running layout/test/style/test_css_cross_domain.html.
Assignee: bobbyholley+bmo → josh
Flags: needinfo?(josh)
I've sorted out the cause:

#0  NS_ProxyRelease (target=0x7ffff7d72ef0, doomed=0x7fffd5c83850, alwaysProxy=false) at /run/media/jdm/ssd/mozilla-central/obj-x86_64-unknown-linux-gnu/xpcom/build/nsProxyRelease.cpp:33
#1  0x00007ffff274b72c in nsMainThreadPtrHolder<nsIInterfaceRequestor>::~nsMainThreadPtrHolder (this=0x7fffd3f9e9c0, __in_chrg=<optimized out>) at ../../../dist/include/nsProxyRelease.h:130
#2  0x00007ffff274b0d1 in nsMainThreadPtrHolder<nsIInterfaceRequestor>::Release (this=0x7fffd3f9e9c0) at ../../../dist/include/nsProxyRelease.h:167
#3  0x00007ffff274bb89 in nsRefPtr<nsMainThreadPtrHolder<nsIInterfaceRequestor> >::assign_assuming_AddRef (this=0x7fffd6dbae78, newPtr=0x7fffd402c540) at ../../../dist/include/nsAutoPtr.h:868
#4  0x00007ffff274b996 in nsRefPtr<nsMainThreadPtrHolder<nsIInterfaceRequestor> >::assign_with_AddRef (this=0x7fffd6dbae78, rawPtr=0x7fffd402c540) at ../../../dist/include/nsAutoPtr.h:852
#5  0x00007ffff274b4fa in nsRefPtr<nsMainThreadPtrHolder<nsIInterfaceRequestor> >::operator= (this=0x7fffd6dbae78, rhs=...) at ../../../dist/include/nsAutoPtr.h:928
#6  0x00007ffff274ad75 in nsMainThreadPtrHandle<nsIInterfaceRequestor>::operator= (this=0x7fffd6dbae78, aOther=...) at ../../../dist/include/nsProxyRelease.h:179
#7  0x00007ffff27497c3 in nsSocketTransport::SetSecurityCallbacks (this=0x7fffd6dbad40, callbacks=0x7fffd3ea4c08) at /run/media/jdm/ssd/mozilla-central/netwerk/base/src/nsSocketTransport2.cpp:1870
#8  0x00007ffff2811e22 in nsHttpConnection::Init (this=0x7fffd3ea4be0, info=0x7fffd3f69540, maxHangTime=10, transport=0x7fffd6dbad58, instream=0x7fffd6dbae90, outstream=0x7fffd6dbaec0, callbacks=
    0x7fffd53165b0, rtt=0) at /run/media/jdm/ssd/mozilla-central/netwerk/protocol/http/nsHttpConnection.cpp:136
#9  0x00007ffff2821935 in nsHttpConnectionMgr::nsHalfOpenSocket::OnOutputStreamReady (this=0x7fffd5c83840, out=0x7fffd6dbaec0)
    at /run/media/jdm/ssd/mozilla-central/netwerk/protocol/http/nsHttpConnectionMgr.cpp:2820
#10 0x00007ffff27450b0 in nsSocketOutputStream::OnSocketReady (this=0x7fffd6dbaec0, condition=NS_OK) at /run/media/jdm/ssd/mozilla-central/netwerk/base/src/nsSocketTransport2.cpp:488
#11 0x00007ffff2748654 in nsSocketTransport::OnSocketReady (this=0x7fffd6dbad40, fd=0x7ffff7d24ca0, outFlags=2) at /run/media/jdm/ssd/mozilla-central/netwerk/base/src/nsSocketTransport2.cpp:1583
#12 0x00007ffff274e927 in nsSocketTransportService::DoPollIteration (this=0x7ffff7d18c60, wait=true) at /run/media/jdm/ssd/mozilla-central/netwerk/base/src/nsSocketTransportService2.cpp:791
#13 0x00007ffff274dfdf in nsSocketTransportService::Run (this=0x7ffff7d18c60) at /run/media/jdm/ssd/mozilla-central/netwerk/base/src/nsSocketTransportService2.cpp:642
#14 0x00007ffff486bc78 in nsThread::ProcessNextEvent (this=0x7ffff7d73640, mayWait=true, result=0x7fffe9afeddf) at /run/media/jdm/ssd/mozilla-central/xpcom/threads/nsThread.cpp:627
#15 0x00007ffff47f520f in NS_ProcessNextEvent (thread=0x7ffff7d73640, mayWait=true) at /run/media/jdm/ssd/mozilla-central/obj-x86_64-unknown-linux-gnu/xpcom/build/nsThreadUtils.cpp:238
#16 0x00007ffff486aa13 in nsThread::ThreadFunc (arg=0x7ffff7d73640) at /run/media/jdm/ssd/mozilla-central/xpcom/threads/nsThread.cpp:265
#17 0x00007ffff7ebb963 in _pt_root (arg=0x7ffff7d629d0) at /run/media/jdm/ssd/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:191
#18 0x000000395b007d14 in start_thread () from /lib64/libpthread.so.0
#19 0x000000395a8f168d in clone () from /lib64/libc.so.6

Bobby's patch makes us unconditionally proxy the release to the main thread. However, at least some callbacks are never meant to be released on the main thread, such as the half open sockets, so we end up screwing up our threading assumptions and triggering the crash.
The windows push is looking rad, so I kicked off linux and mac as well: https://tbpl.mozilla.org/?tree=Try&rev=63116ffb29b8
Attachment #723995 - Attachment is obsolete: true
Lies, debug builds are where it's at: https://tbpl.mozilla.org/?tree=Try&rev=431d1fe0c745
Attachment #736634 - Flags: review?(mcmanus) → review+
https://hg.mozilla.org/mozilla-central/rev/0d31afd775a0
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
You need to log in before you can comment on or make changes to this bug.