Closed Bug 977955 Opened 6 years ago Closed 6 years ago
Module Parent may delete its subprocess before calling Message Channel::Clear, resulting in badness
4.50 KB, patch
|Details | Diff | Splinter Review|
4.56 KB, patch
|Details | Diff | Splinter Review|
1.03 KB, patch
|Details | Diff | Splinter Review|
I-think-I-found-the-cause-of-bug-974933 attempt #3: 1) PluginModuleParent contains a PluginProcessParent, mSubprocess. When PluginModuleParent is created by static ::LoadModule, It passes parent->mSubprocess->GetChannel() to parent->Open(). [A] 2) PluginModuleParent::Open(), calls PluginModuleParent->mChannel->Open(), which is a MessageChannel. 3) MessageChannel::Open creates a ProcessLink, mLink, and passes it the channel from the subprocess in (1) [B] 4) NP_Initialize returns an error from the plugin. We set mShutdown = true without actually calling NP_Shutdown. [E] There are a few other pathways that do something similar [C] [D] (maybe [F], but the callers I checked are caused by channel shutdown) 5) We decide to destroy PluginModuleParent, and call in order: - ~PluginModuleParent - mSubprocess->Destroy() - ~MessageChannel (PluginModuleParent.mChannel destructor) - MessageChannel::Clear - delete mLink - ~ProcessLink - mTransport->set_listener(); mTransport is the channel obtained from mSubprocess in (1) 6) mSubprocess->Destroy() queues a task on the io thread to run |delete this|. This task then races with us getting to MessageChannel::Clear. If it wins the race, MessageChannel.mLink now has a poisoned mTransport, and we crash. This can be reproduced in a debugger by breaking at [G] and [E]. Fudge the plugin return code at [E]* and stop the main thread at [G] so the iothread wins the race. Backtrace attached, which appears identical to the bug 974933 crashes: https://crash-stats.mozilla.com/report/index/c68aee10-a11f-4191-ab54-298b12140227 * I'm not sure about this part -- we can't call NP_Shutdown or MessageChannel::Clear() will prevent the race. Any of the abnormal-failure paths that set |mShutdown = true| without calling ::Clear might be triggering this. [A] http://dxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#97 [B] http://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#296 [C] http://dxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#1196 [D] http://dxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#111 [E] http://dxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#1228 [F] http://dxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginModuleParent.cpp#731 [G] http://dxr.mozilla.org/mozilla-central/source/dom/plugins/ipc/PluginProcessParent.cpp#82
\o/ Can we just call ::Clear() in the error condition? And maybe assert in ~PluginModuleParent that Clear() must already have been called?
This reliably causes the crash without any debugger tricks by having the NPTest plugin crash on NP_Initialize.
(In reply to John Schoenick [:johns] from comment #2) > Created attachment 8383965 [details] [diff] [review] > Test > > This reliably causes the crash without any debugger tricks by having the > NPTest plugin crash on NP_Initialize. For some value of reliably: - Without memory poisoning, this crash is rare. You can see the corruption in a debugger, so its possible a worker thread heap spraying could exploit this - On my linux64 machine the test crashes 90% of the time unless a debugger is attached, which messes with the thread timing and makes it only crash about one-in-five times
So mShutdown tracks if our channel has died, and the destructor calls NP_Shutdown if it has not. This is circumvented, however, by error paths that set it to true during init to indicate that NP_Shutdown shouldn't be called. All of these paths except one in LoadModule also leave the channel open erroneously, so just calling Close() when we want to shutdown without calling the plugin should fix the issue. The one case in LoadModule sets mShutdown because the object dies before calling Open(). It doesn't look like there's a sane way to assert that MessageChannel was cleaned up -- double-calling Close() is a runtime abort, and MessageChannel::Connected() doesn't guarantee we're not somewhere between closed and connected (and is private anyway). Is there an assertion we could add to the destructor, or is it worth modifying MessageChannel to add one? Given that this patch restores mShutdown to properly track the "After Open() before Close()" state I'm fairly confident this can't happen in other ways in the existing code at least.
Attachment #8385010 - Flags: review?(benjamin)
Comment on attachment 8383965 [details] [diff] [review] Test This doesn't always reproduce the issue depending on your environment and timing, but the test isn't very specific -- it ensures we handle the plugin crashing in NP_Initialize -- so it's probably worth having regardless.
Attachment #8383965 - Flags: review?(benjamin)
Comment on attachment 8385010 [details] [diff] [review] Close the channel when aborting before successful init I have an old forgetten patch in bug 814963 which is related. Could you check and that along with your changes unless you think it doesn't make sense any more? I'm a little allergic to MOZ_CRASH("message") because it doesn't annotate the message in the crash report. Please use NS_RUNTIMEABORT("message") instead. r=me with that change.
Attachment #8385010 - Flags: review?(benjamin) → review+
Comment on attachment 8383965 [details] [diff] [review] Test This test will fail with e10s and passing state via the environment is very fragile in general. I think we should either: * Have a new testplugin that always crashes on init * Have some standard location that the testplugin reads on startup and takes special actions such as crashing or hanging on start But I also think that the test for this shouldn't block landing the patch. If you really want to add a mochitest like this now, please make it a browser-chrome test so that we know we are controlling the browser-process environment and not the content process environment in the test.
Attachment #8383965 - Flags: review?(benjamin) → review-
The comment explaining the security issue should probably land later...
Looking at the history - this affects all branches, though bug 889480 added the NP_Initialize trigger. Prior to that, the only error path that sets mShutdown after Open is when the crash reporter fails to initialize -- in that scenario the corruption can still occur.
Comment on attachment 8385636 [details] [diff] [review] Cleanup PluginModule shutdown r=bsmedberg [Security approval request comment] How easily could an exploit be constructed based on the patch? The patch adds a few not-strictly-necessary new safeguards, and the actual security problem is hard to spot. If someone looking at security bugs were to play with the error paths in question, it would not be hard to discover the flaw (e.g. in an ASAN build). Exploiting it is questionable: no user-controlled code runs between the earliest the racing thread could free() our memory and when we access it, but a worker thread or threaded network activity/etc could potentially grab that memory and escalate the corruption. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? The comment warning about lifetimes was split out and can land later. The patch is more ambiguous, but would also only land later. Which older supported branches are affected by this flaw? The flaw is much harder to trigger on ESR24, but still present in one edge case. If not all supported branches, which bug introduced the flaw? Bug 889480 did not introduce this flaw, but added more, possibly user-controllable, ways to hit it. Bug 860254 (FF30) enabled memory poisoning, causing the crash rate spike in bug 974933. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Backports to all branches. How likely is this patch to cause regressions; how much testing does it need? Low to medium risk of exposing other plugin-ipc issues -- which would likely be no more severe than the current top crash.
Attachment #8385636 - Flags: sec-approval?
Comment on attachment 8385636 [details] [diff] [review] Cleanup PluginModule shutdown r=bsmedberg Sec-approval+ for trunk. If you nominate for Aurora, we can safely approve it there. I'm not sure if we want to take this on Beta with less than a week to go for final builds from that branch (same with ESR24). Dan, what do you think?
Attachment #8385636 - Flags: sec-approval? → sec-approval+
I'm going to lower this to sec-high on the theory that an attacker won't be able to reliably cause a plugin to return an error from NP_Initialize(), and sec-moderate for ESR-24 because the crash-reporter is also out of the attacker's control. Since we've found this internally and given the raciness of any exploit we don't have to over-rotate getting this into Beta, but the patch looks non-risky enough to take if we want to. The patch isn't that incriminating so it should be fine.
Whiteboard: ESR-24 variant is sec-moderatish
Comment on attachment 8385636 [details] [diff] [review] Cleanup PluginModule shutdown r=bsmedberg https://hg.mozilla.org/integration/mozilla-inbound/rev/cb0eb74c0cc3
Attachment #8385636 - Flags: checkin+
landed on central https://hg.mozilla.org/mozilla-central/rev/cb0eb74c0cc3
Comment on attachment 8385636 [details] [diff] [review] Cleanup PluginModule shutdown r=bsmedberg [Nominating for beta, wontfixing for esr24 based on comment 13] [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 889480 did not introduce this flaw, but added more, possibly user-controllable, ways to hit it. Bug 860254 (FF30) enabled memory poisoning, causing the crash rate spike in bug 974933. User impact if declined: sec-high, sec-medium on ESR24 (see comment 13) Testing completed (on m-c, etc.): on m-c Risk to taking this patch (and alternatives if risky): Low to medium-low risk of exposing other issues in plugin-IPC code String or IDL/UUID changes made by this patch: None
Comment on attachment 8385636 [details] [diff] [review] Cleanup PluginModule shutdown r=bsmedberg We're too late to take on such risk to our final beta with no time to test it longer before GA ship so this will have to be a wontfix for FF28 as well and ride the trains from FF29.
Assuming I'm reading socorro right, the crash in bug 974933 vanished after the landing: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=IPC%3A%3AChannel%3A%3Aset_listener%28IPC%3A%3AChannel%3A%3AListener*%29#tab-table
(In reply to Benjamin Smedberg [:bsmedberg] from comment #7) > Comment on attachment 8383965 [details] [diff] [review] > Test > > This test will fail with e10s and passing state via the environment is very > fragile in general. I think we should either: > > * Have a new testplugin that always crashes on init > * Have some standard location that the testplugin reads on startup and takes > special actions such as crashing or hanging on start > > But I also think that the test for this shouldn't block landing the patch. > If you really want to add a mochitest like this now, please make it a > browser-chrome test so that we know we are controlling the browser-process > environment and not the content process environment in the test. Agreed with this, will revisit when this bug is opened up
Whiteboard: ESR-24 variant is sec-moderatish → [adv-main29+] ESR-24 variant is sec-moderatish
You need to log in before you can comment on or make changes to this bug.