Closed Bug 1049501 Opened 5 years ago Closed 5 years ago

crash in mozilla::gmp::PGMPParent::DeallocShmems()

Categories

(Core :: IPC, defect, critical)

34 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla34
Tracking Status
firefox32 --- unaffected
firefox33 --- verified
firefox34 --- verified
firefox-esr24 --- unaffected
firefox-esr31 --- unaffected

People

(Reporter: adalucinet, Assigned: jesup)

References

(Blocks 1 open bug)

Details

(Keywords: crash, sec-critical)

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is 
report bp-d2026a25-6aea-49d8-8047-6d8162140806.
=============================================================
More reports: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Agmp%3A%3APGMPParent%3A%3ADeallocShmems%28%29&product=Firefox&query_type=contains&range_unit=weeks&process_type=any&version=Firefox%3A34.0a1&hang_type=any&date=2014-08-06+12%3A00%3A00&range_value=1#tab-reports

After the changes from bug 1044408, Firefox crashes, not only the openh264 plugin.

STR:
1. Launch Nightly and wait for the plugin to install.
2. Navigate to http://mozilla.github.io/webrtc-landing/pc_test_h264.html
3. In about config create media.gmp.plugin.crash pref and set it to true

Notes:
1. Reproducible on Mac OS X 10.9.4 and Ubuntu 12.04 x32 with latest Nightly (Build ID: 20140806030201).
On linux, I only see this in Opt builds, not debug.  Also, if I put a breakpoint on line 1097 in gdb, it doesn't crash.

When it crashes, *this is 0xa5's.  At a breakpoint on line 1097, the mRefCnt is 2 (but note it doesn't crash).

What I think is going on is that perhaps the only remaining references to the GMPParent are from the mPlugin references in the Encoder/DecoderParent's/etc underneath it.  As it goes along releasing them, the GMPParent's refcount goes to 0 and it gets released itself, while still handling DeallocSubtree().

I think DeallocSubtree() needs to hold a self-reference until it exits, in case a sub-member held the last reference to it.
Group: core-security
Component: General → IPC
Keywords: sec-critical
Product: Firefox → Core
Attached file Callstack
Hmm, holding a ref here doesn't seem to be easy (we don't see the real class, in this case GMPParent) or know that we have AddRef/Release available
This may also be due to or complicated by the use of THREADSAFE_REFCOUNTING_WITH_MAIN_THREAD_DESTRUCTION
Toplevel IPDL actors assume that something external keeps the C++ actor alive from Open() until the stack of ActorDestroy() unwinds.

I recommend that you add another manual refcount from here: http://hg.mozilla.org/mozilla-central/annotate/aa1617678a90/content/media/gmp/GMPParent.cpp#l134

until here: http://hg.mozilla.org/mozilla-central/annotate/aa1617678a90/content/media/gmp/GMPParent.cpp#l536

And in ActorDestroy use some sort of event for delayed release.

In particular the sequence here that can cause problems is:
auto PGMPParent::OnChannelError() -> void
{
    DestroySubtree(AbnormalShutdown);
    DeallocSubtree();
    DeallocShmems();
}

Since ActorDestroy is called from DestroySubtree, but we still need the object to be alive all the way through the end of DeallocShmems.
Is this blocking bug 1009760? Florin mentioned that it may be.
Assignee: nobody → rjesup
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> Toplevel IPDL actors assume that something external keeps the C++ actor
> alive from Open() until the stack of ActorDestroy() unwinds.

We do, or should: the reference owned by the service.  The apparent problem is that deallocating the encoders/decoders causes them to:

a) release their references to the GMPParent (fine), and 
b) post runnables to GMPThread to remove themselves from the encoder/decoder lists (ok), which 
c) then causes it to call CloseIfInactive(), which calls Shutdown().  
d) Shutdown() takes the self-reference in GMPService and sends a final runnable to GMPThread to drop it, destroying GMPParent() (rather indirectly, since GMPParent actually is a must-be-destroyed-on-mainthread class, because of it's use of top-level ipdl).

The problem is that the runnable posted in b) runs before DestroySubtree()/etc finish, because the GMPVideoEncoder does a mEncodedThread->Shutdown(), which recurses the event queue for GMPThread, allowing the runnable that triggers the Shutdown to run "early" even though it's to the same thread.  Shutdown() has a lockout to avoid trying to drop the Service reference during abnormalshutdowns, but that doesn't get set until the very end of DestroySubtree, which GMPParent::ActorDestroyed() would get called.

Ugh.  This also explains why it's racy.

Possible solutions:
a) proxy off the thread->Shutdown() call to avoid recursing the GMPThread's event queue.  Works, though it leaves the landmine.  Note that once some other issues are resolved we'd probably like to avoid proactively killing the GMPParent/sandbox immediately, which would help here.
b) use a "guard" object in the GMPParent that gets destroyed in DestroySubtree() before the encoders/etc, and sets a "being destroyed" flag to stop it from processing the Shutdown() immediately.  A bit ugly/hacky/dependent on the PGMPParent creator
c) Ensure the thread shutdown is done *before* sending the runnable to remove itself, so that the recursion can't cause CloseIfInactive() to run and trigger the world ending.
Comment on attachment 8472171 [details] [diff] [review]
Don't let the GMPThread event loop recurse while handling IPDL shutdowns

option c, more or less
Attachment #8472171 - Flags: review?(benjamin)
Attachment #8472171 - Flags: review?(benjamin) → review+
Comment on attachment 8472171 [details] [diff] [review]
Don't let the GMPThread event loop recurse while handling IPDL shutdowns

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Very tough - requires being able to crash the GMP plugin and then to arrange thread timings to trigger it, and then to get the "right" info on the stack.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?  They make it clear it recurses the event queue in GMPThread, and points to this bug which is hidden.  Beyond that, no.

Which older supported branches are affected by this flaw? 33

If not all supported branches, which bug introduced the flaw? bug 1047442

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should apply with no merge issues, and no added risk

How likely is this patch to cause regressions; how much testing does it need? Very unlikely to cause a regression - moves a thread shutdown to a bit earlier in the unwind sequence.
Attachment #8472171 - Flags: sec-approval?
Attachment #8472171 - Flags: approval-mozilla-aurora?
Comment on attachment 8472171 [details] [diff] [review]
Don't let the GMPThread event loop recurse while handling IPDL shutdowns

sec-approval+ and aurora+ as well.
Attachment #8472171 - Flags: sec-approval?
Attachment #8472171 - Flags: sec-approval+
Attachment #8472171 - Flags: approval-mozilla-aurora?
Attachment #8472171 - Flags: approval-mozilla-aurora+
Comment on attachment 8473496 [details] [diff] [review]
Don't let the GMPThread event loop recurse while handling IPDL shutdowns

Try appears green so far, with retriggers of previous landing oranges
https://tbpl.mozilla.org/?tree=Try&rev=b8bc7ab8ba23

Switched to solution a), which avoids any eventloop recursion.
Attachment #8473496 - Flags: review?(benjamin)
Comment on attachment 8473496 [details] [diff] [review]
Don't let the GMPThread event loop recurse while handling IPDL shutdowns

In this case you're passing a reference to a nsCOMPtr which will almost certainly be dead before you run the message. I'm pretty certain you need this to be a non-reference nsCOMPtr.

WrapRunnableNM is an extraordinarily dangerous template precisely because it hides ownership and longevity issues in function declarations, and I wish we didn't have it.
Attachment #8473496 - Flags: review?(benjamin) → review-
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #17)

> In this case you're passing a reference to a nsCOMPtr which will almost
> certainly be dead before you run the message. I'm pretty certain you need
> this to be a non-reference nsCOMPtr.

The default in WrapRunnable is to use the type of the argument.  In this case the type is "nsCOMPtr<nsIThread>" which is the type of mEncodedThread:
  nsCOMPtr<nsIThread> mEncodedThread;
So, the constructed runnable will have an nsCOMPtr<nsIThread> a0_;, and the constructor will work out to "a0_(mEncodedThread)", which will add a reference.

Generally with nsCOMPtr<>s and nsRefPtr<>s it just adds a new reference; I would be cautious before using it with an nsRefPtr<>&.  With nsAutoPtr<> the assignment transfers ownership to the runnable and nulls the source, which also works well if sometimes surprising (I tend to comment that usage).

It does need better documentation though!

// 1 arguments --
template<typename M, typename A0> class runnable_args_nm_1 : public detail::runnable_args_base<detail::NoResult> {
 public:
  runnable_args_nm_1(M m, A0 a0) :
    m_(m), a0_(a0)  {}

  NS_IMETHOD Run() {
    m_(a0_);
    return NS_OK;
  }

 private:
  M m_;
  A0 a0_;
};





> 
> WrapRunnableNM is an extraordinarily dangerous template precisely because it
> hides ownership and longevity issues in function declarations, and I wish we
> didn't have it.
ok. That's equally dangerous but is correct in this case.

Can you please explicitly initialize the template params for clarity?

WrapRunnableNM<typeof &ShutdownEncodedThread, nsCOMPtr<nsIThread> >(&ShutdownEncodedThread, mEncodedThread);

To make the storage type explicit in the face of future change.
r=me with that change or put up a new patch if there's a problem
With bsmedberg's explicit invocation:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2322a7c26246
Holding on re-landing on Aurora until I see green on inbound
Was backed out again:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ee47ca35ebb4

MSVC doesn't like typeof.  now decltype(), verified on local win32 MSVC 2010 build.  Relanded:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8d9ec641ec6
https://hg.mozilla.org/mozilla-central/rev/e8d9ec641ec6
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Flags: qe-verify+
Verified that latest Nightly (2014.08.17) and latest Aurora (2014.08.17) builds do not crash anymore using pref 'media.gmp.plugin.crash', only openh264 plugin.
Status: RESOLVED → VERIFIED
Group: core-security
You need to log in before you can comment on or make changes to this bug.