Closed Bug 857152 Opened 8 years ago Closed 8 years ago
Race conditions around Content
Parent::Get All (and similar methods) and sending messages to processes causes crashes in debug builds, perhaps other problems
If you send a message through a ContentParent before it's been fully initialized, the main process will crash in AsyncChannel::AssertWorkerThread if we're in a debug build. Presumably mWorkerLoop is null. ContentParent::GetAll() returns all ContentParent objects in existence, even those which have not been fully initialized. There are also other ways to get a handle on a ContentParent object before it's been fully initialized. Any code which iterates over every ContentParent and sends each of them a message is subject to this race. For example, nsPermissionManager::AddInternal does this. I'm not sure yet if the only effect here is a debug-time assertion, or if there are other issues here. At the very least it's causing pain and possibly orange in the mozbrowser unit tests.
> There are also other ways to get a handle on a ContentParent object before it's been fully > initialized. Upon further investigation, I think what's actually happening here is that we're sending messages to ContentParent's whose channels have already been /closed/. That's a bit easier to deal with: Just drop the errors on the floor.
Here's an example of a null-pointer dereference I got after I worked around the thread assertions. > Program received signal SIGSEGV, Segmentation fault. > 0x00002aaaaee73945 in mozilla::ipc::RPCChannel::EnteredCxxStack (this=0x29af250) > at ../../dist/include/mozilla/ipc/RPCChannel.h:197 > 197 Listener()->OnEnteredCxxStack(); > (gdb) p Listener() > $1 = (mozilla::ipc::RPCChannel::RPCListener *) 0x0 > (gdb) bt > #0 0x00002aaaaee73945 in mozilla::ipc::RPCChannel::EnteredCxxStack (this=0x29af250) > at ../../dist/include/mozilla/ipc/RPCChannel.h:197 > #1 0x00002aaaaee73880 in mozilla::ipc::RPCChannel::CxxStackFrame::CxxStackFrame (this=0x7ffffffdc7f8, that=..., > direction=mozilla::ipc::RPCChannel::OUT_MESSAGE, msg=0x7ffffffdc800) > at ../../dist/include/mozilla/ipc/RPCChannel.h:250 > #2 0x00002aaaaee6ea2b in mozilla::ipc::RPCChannel::CxxStackFrame::CxxStackFrame (this=0x7ffffffdc7f8, that=..., > direction=mozilla::ipc::RPCChannel::OUT_MESSAGE, msg=0x7ffffffdc800) > at ../../dist/include/mozilla/ipc/RPCChannel.h:259 > #3 0x00002aaaaee6e9bc in mozilla::ipc::RPCChannel::Send (this=0x29af250, msg=0x1268e80) > at ../../../src/ipc/glue/RPCChannel.cpp:109 > #4 0x00002aaaaeef4c3c in mozilla::dom::PContentParent::SendAddPermission (this=0x29af240, permission=...) > at PContentParent.cpp:875 > #5 0x00002aaaaeb00bd7 in nsPermissionManager::AddInternal (this=0xde3e40, aPrincipal=0x36eb010, aType=..., > aPermission=0, aID=0, aExpireType=0, aExpireTime=0, aNotifyOperation=nsPermissionManager::eNotify, > aDBOperation=nsPermissionManager::eWriteToDB) at ../../../src/extensions/cookie/nsPermissionManager.cpp:666 > #6 0x00002aaaaeb0435a in nsPermissionManager::RemoveFromPrincipal (this=0xde3e40, aPrincipal=0x36eb010, > aType=0x36ead50 "browser") at ../../../src/extensions/cookie/nsPermissionManager.cpp:860 > #7 0x00002aaaaf96f86a in NS_InvokeByIndex_P (that=0xde3e40, methodIndex=6, paramCount=2, params=0x7ffffffdd2b0) > at ../../../../../../../src/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:162 > #8 0x00002aaaae61b2a2 in Invoke (this=0x7ffffffdd270) > at ../../../../src/js/xpconnect/src/XPCWrappedNative.cpp:3085 > #9 Call (this=0x7ffffffdd270) at ../../../../src/js/xpconnect/src/XPCWrappedNative.cpp:2419 > #10 XPCWrappedNative::CallMethod (ccx=..., mode=XPCWrappedNative::CALL_METHOD) > at ../../../../src/js/xpconnect/src/XPCWrappedNative.cpp:2385 > #11 0x00002aaaae62a4db in XPC_WN_CallMethod (cx=0x1956410, argc=2, vp=0x2aaae00414c8) > at ../../../../src/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1417 > #12 0x00002aaab0eb1a37 in CallJSNative (cx=0x1956410, > native=0x2aaaae629ca0 <XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)>, args=...) > at ../../../src/js/src/jscntxtinlines.h:327 > #13 js::mjit::CallCompiler::generateNativeStub (this=0x7ffffffdeca0) > at ../../../src/js/src/methodjit/MonoIC.cpp:1062 > #14 0x00002aaab0eb0e72 in js::mjit::ic::NativeCall (f=..., ic=0x3579cc8) > at ../../../src/js/src/methodjit/MonoIC.cpp:1373 > #15 0x00002aab3c4637da in ?? () > #16 0x00002aab3c457b68 in ?? () > #17 0x0000000000000001 in ?? () > #18 0x00007ffffffe2a10 in ?? () > #19 0x0000000000000000 in ?? ()
Make IPC channels more robust against use after being closed. This patch makes three changes. The goal of these changes is to prevent IPC channels from crashing if you try to send a message across one after the channel has been closed. 1. Each message_loop object gets a unique, non-repeating ID. AsyncChannel uses this ID, instead of a pointer to the message loop, to assert that it's being used on the correct thread. This allows AsyncChannel to assert that it's being used on the correct thread after it's cleared its pointer to the thread. 2. AsyncChannel::mMonitor isn't cleared in AsyncChannel::Clear. There's no reason to do so; the monitor can get cleared when the channel is destroyed. 3. AsyncChannel's mListener pointer is changed to a mozilla::WeakPtr. This allows us to keep the mListener reference around longer, whereas before we cleared it in AsyncChannel::Clear. Channels try not to crash if you Send() on them after they've been closed, but in practice they'd dereference mListener and mMonitor before checking whether they'd been closed. Note that weak references are not threadsafe, so if mListener->asWeakRef() was called on a different thread than the one that ultimately destroys AsyncChannel, we'd have a problem. But I don't think this happens.
Attachment #733082 - Flags: review?(bent.mozilla)
Assignee: nobody → justin.lebar+bug
Should this be tef+ since it blocks bug 844323?
(In reply to Andrew Overholt [:overholt] from comment #4) > Should this be tef+ since it blocks bug 844323? Yes.
blocking-b2g: --- → tef?
Comment on attachment 733082 [details] [diff] [review] Patch, v1 Review of attachment 733082 [details] [diff] [review]: ----------------------------------------------------------------- r=me with this change. ::: ipc/chromium/src/base/message_loop.cc @@ +86,5 @@ > //DCHECK(loop) << "Ouch, did you forget to initialize me?"; > return lazy_tls_ptr.Pointer()->Get(); > } > > +base::AtomicSequenceNumber message_loop_id_seq(base::LINKER_INITIALIZED); Rather than using this let's try using NSPR. We know that works on all our platforms and I have no idea if there are bugs lurking in the chromium atomics (we haven't updated it in like forever).
Attachment #733082 - Flags: review?(bent.mozilla) → review+
Did you mean to ask for review?
Comment on attachment 737892 [details] [diff] [review] Make IPC channels more robust against use after being closed. Quick hit to confirm that the NSPR change makes sense.
Attachment #737892 - Flags: feedback?(justin.lebar+bug)
Whiteboard: [madrid] → [madrid][status: needs review jlebar]
Comment on attachment 737892 [details] [diff] [review] Make IPC channels more robust against use after being closed. PR_ATOMIC_INCREMENT is probably preferable to PR_AtomicIncrement. lgtm aside from that.
Attachment #737892 - Flags: feedback?(justin.lebar+bug) → feedback+
Whiteboard: [madrid][status: needs review jlebar] → [madrid][fixed-in-birch]
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.