Race conditions around ContentParent::GetAll (and similar methods) and sending messages to processes causes crashes in debug builds, perhaps other problems

RESOLVED FIXED in Firefox 23

Status

()

RESOLVED FIXED
6 years ago
2 years ago

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

Tracking

Trunk
1.0.1 Madrid (19apr)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, firefox23 fixed, b2g18 fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

Details

(Whiteboard: [madrid][fixed-in-birch])

Attachments

(1 attachment, 1 obsolete attachment)

If you send a message through a ContentParent before it's been fully initialized, the main process will crash in AsyncChannel::AssertWorkerThread if we're in a debug build.  Presumably mWorkerLoop is null.

ContentParent::GetAll() returns all ContentParent objects in existence, even those which have not been fully initialized.  There are also other ways to get a handle on a ContentParent object before it's been fully initialized.

Any code which iterates over every ContentParent and sends each of them a message is subject to this race.  For example, nsPermissionManager::AddInternal does this.

I'm not sure yet if the only effect here is a debug-time assertion, or if there are other issues here.  At the very least it's causing pain and possibly orange in the mozbrowser unit tests.
(Assignee)

Updated

6 years ago
Blocks: 844323
> There are also other ways to get a handle on a ContentParent object before it's been fully 
> initialized.

Upon further investigation, I think what's actually happening here is that we're sending messages to ContentParent's whose channels have already been /closed/.  That's a bit easier to deal with: Just drop the errors on the floor.
Here's an example of a null-pointer dereference I got after I worked around the thread assertions.

> Program received signal SIGSEGV, Segmentation fault.
> 0x00002aaaaee73945 in mozilla::ipc::RPCChannel::EnteredCxxStack (this=0x29af250)
>     at ../../dist/include/mozilla/ipc/RPCChannel.h:197
> 197	        Listener()->OnEnteredCxxStack();
> (gdb) p Listener()
> $1 = (mozilla::ipc::RPCChannel::RPCListener *) 0x0
> (gdb) bt
> #0  0x00002aaaaee73945 in mozilla::ipc::RPCChannel::EnteredCxxStack (this=0x29af250)
>     at ../../dist/include/mozilla/ipc/RPCChannel.h:197
> #1  0x00002aaaaee73880 in mozilla::ipc::RPCChannel::CxxStackFrame::CxxStackFrame (this=0x7ffffffdc7f8, that=..., 
>     direction=mozilla::ipc::RPCChannel::OUT_MESSAGE, msg=0x7ffffffdc800)
>     at ../../dist/include/mozilla/ipc/RPCChannel.h:250
> #2  0x00002aaaaee6ea2b in mozilla::ipc::RPCChannel::CxxStackFrame::CxxStackFrame (this=0x7ffffffdc7f8, that=..., 
>     direction=mozilla::ipc::RPCChannel::OUT_MESSAGE, msg=0x7ffffffdc800)
>     at ../../dist/include/mozilla/ipc/RPCChannel.h:259
> #3  0x00002aaaaee6e9bc in mozilla::ipc::RPCChannel::Send (this=0x29af250, msg=0x1268e80)
>     at ../../../src/ipc/glue/RPCChannel.cpp:109
> #4  0x00002aaaaeef4c3c in mozilla::dom::PContentParent::SendAddPermission (this=0x29af240, permission=...)
>     at PContentParent.cpp:875
> #5  0x00002aaaaeb00bd7 in nsPermissionManager::AddInternal (this=0xde3e40, aPrincipal=0x36eb010, aType=..., 
>     aPermission=0, aID=0, aExpireType=0, aExpireTime=0, aNotifyOperation=nsPermissionManager::eNotify, 
>     aDBOperation=nsPermissionManager::eWriteToDB) at ../../../src/extensions/cookie/nsPermissionManager.cpp:666
> #6  0x00002aaaaeb0435a in nsPermissionManager::RemoveFromPrincipal (this=0xde3e40, aPrincipal=0x36eb010, 
>     aType=0x36ead50 "browser") at ../../../src/extensions/cookie/nsPermissionManager.cpp:860
> #7  0x00002aaaaf96f86a in NS_InvokeByIndex_P (that=0xde3e40, methodIndex=6, paramCount=2, params=0x7ffffffdd2b0)
>     at ../../../../../../../src/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_unix.cpp:162
> #8  0x00002aaaae61b2a2 in Invoke (this=0x7ffffffdd270)
>     at ../../../../src/js/xpconnect/src/XPCWrappedNative.cpp:3085
> #9  Call (this=0x7ffffffdd270) at ../../../../src/js/xpconnect/src/XPCWrappedNative.cpp:2419
> #10 XPCWrappedNative::CallMethod (ccx=..., mode=XPCWrappedNative::CALL_METHOD)
>     at ../../../../src/js/xpconnect/src/XPCWrappedNative.cpp:2385
> #11 0x00002aaaae62a4db in XPC_WN_CallMethod (cx=0x1956410, argc=2, vp=0x2aaae00414c8)
>     at ../../../../src/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1417
> #12 0x00002aaab0eb1a37 in CallJSNative (cx=0x1956410, 
>     native=0x2aaaae629ca0 <XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)>, args=...)
>     at ../../../src/js/src/jscntxtinlines.h:327
> #13 js::mjit::CallCompiler::generateNativeStub (this=0x7ffffffdeca0)
>     at ../../../src/js/src/methodjit/MonoIC.cpp:1062
> #14 0x00002aaab0eb0e72 in js::mjit::ic::NativeCall (f=..., ic=0x3579cc8)
>     at ../../../src/js/src/methodjit/MonoIC.cpp:1373
> #15 0x00002aab3c4637da in ?? ()
> #16 0x00002aab3c457b68 in ?? ()
> #17 0x0000000000000001 in ?? ()
> #18 0x00007ffffffe2a10 in ?? ()
> #19 0x0000000000000000 in ?? ()
Created attachment 733082 [details] [diff] [review]
Patch, v1

Make IPC channels more robust against use after being closed.

This patch makes three changes.  The goal of these changes is to prevent
IPC channels from crashing if you try to send a message across one after
the channel has been closed.

1. Each message_loop object gets a unique, non-repeating ID.
AsyncChannel uses this ID, instead of a pointer to the message loop, to
assert that it's being used on the correct thread.  This allows
AsyncChannel to assert that it's being used on the correct thread after
it's cleared its pointer to the thread.

2. AsyncChannel::mMonitor isn't cleared in AsyncChannel::Clear.  There's
no reason to do so; the monitor can get cleared when the channel is
destroyed.

3. AsyncChannel's mListener pointer is changed to a mozilla::WeakPtr.
This allows us to keep the mListener reference around longer, whereas
before we cleared it in AsyncChannel::Clear.

Channels try not to crash if you Send() on them after they've been
closed, but in practice they'd dereference mListener and mMonitor before
checking whether they'd been closed.

Note that weak references are not threadsafe, so if
mListener->asWeakRef() was called on a different thread than the one
that ultimately destroys AsyncChannel, we'd have a problem.  But I don't
think this happens.
Attachment #733082 - Flags: review?(bent.mozilla)
(Assignee)

Updated

6 years ago
Assignee: nobody → justin.lebar+bug
Should this be tef+ since it blocks bug 844323?
(In reply to Andrew Overholt [:overholt] from comment #4)
> Should this be tef+ since it blocks bug 844323?

Yes.
blocking-b2g: --- → tef?
blocking-b2g: tef? → tef+
Comment on attachment 733082 [details] [diff] [review]
Patch, v1

Review of attachment 733082 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with this change.

::: ipc/chromium/src/base/message_loop.cc
@@ +86,5 @@
>    //DCHECK(loop) << "Ouch, did you forget to initialize me?";
>    return lazy_tls_ptr.Pointer()->Get();
>  }
>  
> +base::AtomicSequenceNumber message_loop_id_seq(base::LINKER_INITIALIZED);

Rather than using this let's try using NSPR. We know that works on all our platforms and I have no idea if there are bugs lurking in the chromium atomics (we haven't updated it in like forever).
Attachment #733082 - Flags: review?(bent.mozilla) → review+
Created attachment 737892 [details] [diff] [review]
Make IPC channels more robust against use after being closed.
Attachment #733082 - Attachment is obsolete: true
Did you mean to ask for review?
Comment on attachment 737892 [details] [diff] [review]
Make IPC channels more robust against use after being closed.

Quick hit to confirm that the NSPR change makes sense.
Attachment #737892 - Flags: feedback?(justin.lebar+bug)
Whiteboard: [madrid]
Whiteboard: [madrid] → [madrid][status: needs review jlebar]
Comment on attachment 737892 [details] [diff] [review]
Make IPC channels more robust against use after being closed.

PR_ATOMIC_INCREMENT is probably preferable to PR_AtomicIncrement.  lgtm aside from that.
Attachment #737892 - Flags: feedback?(justin.lebar+bug) → feedback+
https://hg.mozilla.org/projects/birch/rev/c97f19dc7f7f
Whiteboard: [madrid][status: needs review jlebar] → [madrid][fixed-in-birch]
Target Milestone: --- → Madrid (19apr)
https://hg.mozilla.org/mozilla-central/rev/c97f19dc7f7f
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
https://hg.mozilla.org/releases/mozilla-b2g18/rev/cff28c0455ad
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/41acb5ed40c6
status-b2g18: --- → fixed
status-b2g18-v1.0.0: --- → wontfix
status-b2g18-v1.0.1: --- → fixed
status-firefox21: --- → wontfix
status-firefox22: --- → wontfix
status-firefox23: --- → fixed
Depends on: 865766
Depends on: 1290975
You need to log in before you can comment on or make changes to this bug.