IPC reentrancy issues due to nested event loops
Categories
(Core :: IPC, defect, P2)
Tracking
()
People
(Reporter: philipp, Unassigned)
Details
(4 keywords)
Crash Data
Comment 1•7 years ago
|
||
Comment 2•7 years ago
|
||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Comment 6•7 years ago
|
||
Updated•7 years ago
|
Comment 7•7 years ago
|
||
Comment 9•7 years ago
|
||
Comment 10•7 years ago
|
||
Comment 11•7 years ago
|
||
Comment 12•7 years ago
|
||
Comment 13•7 years ago
|
||
Comment 14•7 years ago
|
||
Comment 15•7 years ago
|
||
Updated•7 years ago
|
Updated•7 years ago
|
Comment 16•7 years ago
|
||
Updated•7 years ago
|
Comment 17•6 years ago
|
||
April seems to have accidentally found someone on twitter who can reproduce this reliably: https://twitter.com/William_Leech/status/1121150565273231360 .
If rearchitecturing the event loop is not an option, are there smaller changes we can make, like moving away from synchronous (+ nested event loop) uses of modal filepickers or something?
Also, this is marked sectype-uaf and the crash address looks... not-near-null. If we can't fix the crash, can we make it a nullptr one instead of something that's potentially exploitable, or am I just misreading things? (quite possible!)
Comment 18•6 years ago
|
||
was digging down the same thread as Gijs apparently. One of those crashes is bp-85304835-ee72-4de7-974e-0f6f50190424. The crash address being not near null is OK because it's the instruction pointer for the breakpoint.
In that crash the user has April's Certainly Something extension as well as "British English Dictionary (Forked by Marco Pinto)". Given the steps there may be something involving Web Extension APIs going on here.
Comment 19•6 years ago
|
||
(In reply to :Gijs (he/him) from comment #17)
If rearchitecturing the event loop is not an option, are there smaller changes we can make, like moving away from synchronous (+ nested event loop) uses of modal filepickers or something?
Also, this is marked sectype-uaf and the crash address looks... not-near-null. If we can't fix the crash, can we make it a nullptr one instead of something that's potentially exploitable, or am I just misreading things? (quite possible!)
This is complicated. The general case of this problem is a UAF, and needs some kind of major change (which, given the current state of IPC planning, probably means refcounting all actors rather than changing the use of event loops).
For the more specific case of destroying a ContentParent while it's in use — which is the case that started this bug and the case reported on Twitter — that's not inherently security-sensitive because we release-assert about it, and there is a relatively simple way to avoid the crash (see comment #13). However, I'm concerned about the possibility of a fix for this case pointing out the existence of the larger problem.
I think I know some of what happened with the STR in the tweet:
- The drag-and-drop caused navigation to the cert file, which caused a call to
PSMContentDownloaderParent::RecvOnStopRequest(on an actor managed by theContentParentfor thefile:///process) PSMContentDownloaderParentshowed a XUL modal (which wasn't noticed by the user?), spinning a nested event loop- Clicking on the
<input type="file">nested a file open dialog inside that, which seems as if it shouldn't matter, but maybe does - Something causes the file content process to be shut down and cycle collected
The crash reports show that something caused a XUL modal, nested inside an IPC callback for something in a PContent actor tree, but the stack trace doesn't go far enough to directly indicate the cause.
I can reproduce the XUL modal and nested event loop by trying to open a certificate file; attaching with a debugger shows the PContentParent::OnMessageReceived on the stack outside the nested event loop, and I can see that the ContentParent in question belongs to the file content process. I haven't yet been able to cause the ContentParent to be destroyed in the nested event loop.
I notice that almost all reports for this bug's crash signature are on Windows (with a few on Mac that may not be going through the XUL modal path); I wonder if there's something OS-specific about this.
(Leaving needinfo while I decide what to do about this.)
Updated•6 years ago
|
Comment 20•6 years ago
|
||
I think I know why I couldn't reproduce this by manually killing the process: the piece of code I linked in comment #13 delays notifying the actor thread of link errors until it's not in the nested event loop.
What I can do is attach a debugger, get the ContentParent* value, and inject a call of ShutDownProcess(0) on it. That doesn't crash immediately, but it does cause a real use-after-free when I close the PSM modal window:
Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
0x00007f547aafa84c in mozilla::ipc::IProtocol::GetIPCChannel (this=<optimized out>) at /home/jld/src/obj.gecko-dev/obj-x86_64-pc-linux-gnu/dist/include/mozilla/ipc/ProtocolUtils.h:274
274 MessageChannel* GetIPCChannel() { return mState->GetIPCChannel(); }
(gdb) x/i $rip
=> 0x7f547aafa84c <mozilla::ipc::IPDLParamTraits<mozilla::psm::PPSMContentDownloaderParent>::Write(IPC::Message*, mozilla::ipc::IProtocol*, mozilla::psm::PPSMContentDownloaderParent* const&)+84>:
mov (%rax),%rcx
(gdb) p/x $rax
$3 = 0xe5e5e5e5e5e5e5e5
That's a use-after-free of the PSMContentDownloaderParent, not the ContentParent; it's freed via PContentParent::DeallocSubtree → ContentParent::DeallocPPSMContentDownloaderParent → dropping the last reference. The ContentParent may have other refs besides the ones released via ActorDestroy, or in principle this could happen if there doesn't happen to be a cycle collection before the modal window is closed. This also means that a patch to hold the last last ContentParent ref until the event loop is un-nested, like the one I've written and was trying to test, would just change the cases where we currently crash safely into UAFs.
This could probably be worked around in the specific case of PSMContentDownloaderParent by making it take a death grip while nesting the event loop, but there may be other instances of this problem.
Comment 21•6 years ago
|
||
Bug 1540731 should take care of this in most cases, and in particular it empirically seems to fix the PSMContentDownloaderParent UAF from the last comment, but I think there are still some actors that do weird things with their lifecycles and might have security implications from nested event loops; I'll have to check on that. In any case, this is still a live issue for all current non-Nightly releases, and bug 1540731 is an invasive change that we normally wouldn't uplift, so this will need to remain hidden for a while.
For the specific case of ContentParent, the crash should now be fixable by overriding ActorDealloc to release the self-reference. That fix isn't security-sensitive itself, but linking it to the crash signature (vs. passing it off as cleanup) might give too much visibility to the general problem with nested event loops.
Comment 22•6 years ago
|
||
I filed bug 1562769 publicly for the ContentParent fix, not mentioning the crash and marking it as just an enhancement.
Comment 23•3 years ago
|
||
It looks like there were a ton of fixes landed for this bug in 2019, and I can't see any crashes on versions newer than 68 in the last 6 months, so let's close this.
Updated•2 years ago
|
Description
•