Closed Bug 933447 Opened 6 years ago Closed 6 years ago

mozRTCPeerConnection crash with GC/CC

Categories

(Core :: WebRTC, defect, critical)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla28
Tracking Status
firefox26 --- unaffected
firefox27 + verified
firefox28 + verified
firefox-esr17 --- unaffected
firefox-esr24 --- unaffected
b2g18 --- unaffected
b2g-v1.2 --- unaffected

People

(Reporter: mccr8, Assigned: jib)

References

(Blocks 1 open bug)

Details

(7 keywords, Whiteboard: [webrtc])

Crash Data

Attachments

(2 files, 3 obsolete files)

See test case 1 in bug 928221.
Thanks for filing this, Andrew. 

Adam, I believe you've already starting working on this, so I'm officially assigning it to you.  If you're too busy with IETF/W3 stuff, just let me know.  Thanks.
Assignee: nobody → adam
I'm already looking at this one FWIW.
(In reply to Jan-Ivar Bruaroey [:jib] from comment #2)
> I'm already looking at this one FWIW.

Jan-Ivar, if you want to take over ownership of this bug, that's fine by me.  Just coordinate with Abr first.  Thanks.
Agreed with abr that I'd take a stab at this one.
Assignee: adam → jib
Status: NEW → ASSIGNED
OK, I removed the (arguably redundant main-to-main) dispatch and it didn't help: 

#0 mozilla::dom::CallbackObject::CallSetup::CallSetup(JS::Handle<JSObject*>, mozilla::ErrorResult&, mozilla::dom::CallbackObject::ExceptionHandling, JSCompartment*) at dom/bindings/CallbackObject.cpp:65
#1 mozilla::dom::CallbackObject::CallSetup::CallSetup(JS::Handle<JSObject*>, mozilla::ErrorResult&, mozilla::dom::CallbackObject::ExceptionHandling, JSCompartment*) at dom/bindings/CallbackObject.cpp:147
#2 mozilla::dom::PeerConnectionObserverJSImpl::OnStateChange(mozilla::dom::PCObserverStateType, mozilla::ErrorResult&, JSCompartment*) at dom/bindings/./PeerConnectionObserverBinding.cpp:2027
#3 mozilla::dom::PeerConnectionObserver::OnStateChange(mozilla::dom::PCObserverStateType, mozilla::ErrorResult&, JSCompartment*) at dom/bindings/./PeerConnectionObserverBinding.cpp:2477

PeerConnectionObserver::OnStateChange(PCObserverStateType state, ErrorResult& aRv, JSCompartment* aCompartment)
{
  return mImpl->... // mImpl == 0x0.

#4 sipcc::PeerConnectionImpl::IceStateChange_m(mozilla::dom::PCImplIceState) at media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:1613

  ...
  nsRefPtr<PeerConnectionObserver> pco = mPCObserver.MayGet();
  if (!pco) {
    return NS_OK;
  }
  JSErrorResult rv;
  pco->OnStateChange(PCObserverStateType::IceState, rv);

mPCObserver	sipcc::PeerConnectionImpl::WeakConcretePtr	
  mObserver	mozilla::dom::PeerConnectionObserver *	0x0000000121208140
  mWeakPtr	nsWeakPtr	
  mRawPtr	nsWeakReference *	0x0000000121277da0

#5 mozilla::runnable_args_m_1<nsRefPtr<sipcc::PeerConnectionImpl>, tag_nsresult (sipcc::PeerConnectionImpl::*)(mozilla::dom::PCImplIceState), mozilla::dom::PCImplIceState>::Run() at media/webrtc/signaling/../../../media/mtransport/runnable_utils_generated.h:125
#6 nsThread::ProcessNextEvent(bool, bool*) at xpcom/threads/nsThread.cpp:622

So, pco->mImpl is gone, yet the WeakReferent guard object is still alive (this is all on main thread), so the guard must not be working, as I'm not keeping what I want alive. I'm pondering what bz said in Bug 928535 comment 7.
This patch depends on the patch in bug 928535 and is a lot cleaner, but still crashes - This rules out the problem being the WeakConcretePtr kludge.

Symptom is unchanged (see comment 5): mImpl = null.

mImpl is inside the DOM bindings, and breakpoints on weakPtr returning NULL never hit, so it looks like the webidl binding code doesn't handle being weak-referenced at all. If someone could confirm this that would be great.

I'm looking for someone in the DOM team to help or take this over as I'm stumped, but I'm posting here rather than in bug 928535 because of the sec crash.
Attachment #829440 - Flags: feedback?(continuation)
Flags: needinfo?(bzbarsky)
Attached file modified testcase 1
Adds a 1-15 ms delay and loops to more reliably produce problem (I sometimes had trouble with the original).
The only way mImpl can be null is if the object has already been unlinked.

It's possible for the object to still be alive even though it's been unlinked, though, because we do lazy deletion of CCed objects, as far as I can tell.  Does explicitly dropping weak refs in unlink like this help?
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #8)
> Created attachment 829455 [details] [diff] [review]
> Does applying this on top of your patch help?

Yes it works! \o/
Attachment #829455 - Flags: review?(bugs)
Depends on: 928535
Comment on attachment 829440 [details] [diff] [review]
Using new JS webidl weakPtr; Removed weakConcretePtr kludge + redundant runnables; Still crashes

Thanks for figuring this out, Boris!  Weak references are the worst.
Attachment #829440 - Flags: feedback?(continuation)
Attachment #829455 - Flags: review?(bugs) → review+
So Olli had some thoughts on IRC about making do_QueryReferent return null for referents with a refcount of 0, I think.  That would provide a less whack-a-molish fix for the problem.
Yeah, we definitely need a general solution here.
I filed bug 936728 for a general solution.
Cleaner fix, and no longer removes any runnables.
Attachment #829440 - Attachment is obsolete: true
Attachment #829650 - Flags: review?(adam)
Attachment #829650 - Flags: review?(continuation)
Comment on attachment 829650 [details] [diff] [review]
Uses new JS webidl weakPtr (removed weakConcretePtr kludge)

Review of attachment 829650 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me, though I'm not familiar with the PeerConnection stuff in particular.
Attachment #829650 - Flags: review?(continuation) → review+
Comment on attachment 829650 [details] [diff] [review]
Uses new JS webidl weakPtr (removed weakConcretePtr kludge)

Review of attachment 829650 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, assuming the magic function that I don't understand makes sense to Andrew.

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
@@ +977,5 @@
>  #endif
>    return NS_OK;
>  }
>  
> +// Helper - specialized for the single type needed here

This really needs a better comment. My understanding is that this is a temporary fix, to be pulled out when Bug 936728 lands.

Please expand the comment to minimally explain what this code does, indicate it's temporary, and cite Bug 936728.

Also, I don't understand what this does at all. I have not reviewed it for correctness. I'm assuming that Andrew's review covered this part of the code in particular.
Attachment #829650 - Flags: review?(adam) → review+
> I'm assuming that Andrew's review covered this part of the code in particular.
Yes.
(In reply to Adam Roach [:abr] from comment #19)
> Comment on attachment 829650 [details] [diff] [review]
> ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp
> > +// Helper - specialized for the single type needed here
> 
> This really needs a better comment. My understanding is that this is a
> temporary fix, to be pulled out when Bug 936728 lands.

Alas, the do_QueryObjectReferent() helper is not temporary till Bug 936728.

Here's what each of the 9 weak-ref calls would look like without it: 

 nsCOMPtr<nsISupportsWeakReference> tmp = do_QueryReferent(mPCObserver);
 nsRefPtr<nsSupportsWeakReference> tmp2 = do_QueryObject(tmp);
 nsRefPtr<PeerConnectionObserver> pco = static_cast<PeerConnectionObserver*>(&*tmp2);
 if (!pco) {
   return;
 }

because nsWeakPtr deals in XPCOM interfaces, while webidl deals in concrete objects (Andrew, if this is excessive please let me know. Perhaps tmp2 can be optimized out, but it's still pretty ugly).

do_QueryObjectReferent() or something like should probably be turned into a central function or template somewhere, but right now I'm looking for Aurora uplift.

I'll add this in the code-comment.
Added comment and merged in bz's diff-patch.

Carrying forward reviews from smaug, mccr8 and abr (patch notes pieces are reviewed by one or more).

Ready to go once Bug 928535 lands.

Andrew, feel free to move codegen changes to Bug 928535 if you prefer.
Attachment #829455 - Attachment is obsolete: true
Attachment #829650 - Attachment is obsolete: true
Attachment #833010 - Flags: review+
Target Milestone: --- → mozilla28
Patch still applies (with an offset), works and local mochitests are clean.

Fresh try, just in case - https://tbpl.mozilla.org/?tree=Try&rev=420089a45d89

Otherwise this is ready to go.
Try looks good (mediarecorder orange is not new)
Keywords: checkin-needed
Needs sec-approval since sec-crit and affecting Aurora too, no?
Keywords: checkin-needed
Comment on attachment 833010 [details] [diff] [review]
Merged patch (Uses new JS webidl weakPtr + codegen fix). r=smaug, mccr8, abr

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

I'm going to go with low-to-medium, since the known case consistently ends up in a null pointer, and the window is relatively narrow (GC delayed delete). However, a GC'ed object is involved, which makes it hard to be certain.

A script like the test-case, can create new PeerConnections in a loop without holding them, which creates a situation where browser-chrome follows a weak reference to call a function of a garbage collected object that has been unlinked yet still hasn't been deleted. When this member function runs, it crashes when it tries to use its unlinked member reference to another object (null).

However, I can't rule out that there aren't ways to provoke different uses of this GC'ed object that I haven't thought of that might produce heap_after_use.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No.

Which older supported branches are affected by this flaw?

mozilla27

If not all supported branches, which bug introduced the flaw?

Bug 882145

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Same patch should work on Aurora, except the fix depends on Bug 928535 which must be backported first.

How likely is this patch to cause regressions; how much testing does it need?

Not likely. It relies on the new functionality in Bug 928535 but is a lot cleaner than existing code.
Attachment #833010 - Flags: sec-approval?
Comment on attachment 833010 [details] [diff] [review]
Merged patch (Uses new JS webidl weakPtr + codegen fix). r=smaug, mccr8, abr

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 882145
User impact if declined: Intermittent nullptr crash, not ruling out heap_after_use
Testing completed (on m-c, etc.): Successful try. Test-case no longer crashes.
Risk to taking this patch (and alternatives if risky): Low
String or IDL/UUID changes made by this patch: None

*** REQUIRES BACKPORT OF BUG 928535 FIRST ***
Attachment #833010 - Flags: approval-mozilla-aurora?
Comment on attachment 833010 [details] [diff] [review]
Merged patch (Uses new JS webidl weakPtr + codegen fix). r=smaug, mccr8, abr

sec-approval+ for trunk.

After it goes in there and things are green, please put on Aurora. I'm giving approval there too.
Attachment #833010 - Flags: sec-approval?
Attachment #833010 - Flags: sec-approval+
Attachment #833010 - Flags: approval-mozilla-aurora?
Attachment #833010 - Flags: approval-mozilla-aurora+
(In reply to Jan-Ivar Bruaroey [:jib] from comment #26)
> Which older supported branches are affected by this flaw?
> 
> mozilla27
> 
> If not all supported branches, which bug introduced the flaw?
> 
> Bug 882145

Those seem contradictory... what about Firefox 26? Seems like a big set of patches for 2nd half beta though.
Flags: needinfo?(jib)
Blocks: 882145
(In reply to Daniel Veditz [:dveditz] from comment #29)
> > Bug 882145
> 
> Those seem contradictory... what about Firefox 26? Seems like a big set of
> patches for 2nd half beta though.

Yes you're right. Bug 882145 landed just before end of 26, which is probably what confused us. Probably also why Bug 928221 was never backported to 26 like it should have been. :-(
Flags: needinfo?(jib)
PeerConnection is disabled on b2g1.2 so we don't have to worry about that long-lived branch.

The patch isn't as big as it seems when you realize some of it is backing out parts of bug 928221 in favor of the small patch in bug 928535. Fixing the three bugs in one combined patch shouldn't be too risky for Beta.
(In reply to Daniel Veditz [:dveditz] from comment #29)
> (In reply to Jan-Ivar Bruaroey [:jib] from comment #26)
> > Bug 882145
> 
> Those seem contradictory... what about Firefox 26?

Whoops, make that Bug 917328.

These bugs were caused by Bug 917328 which landed 10-14. So Beta is fine and Bug 928221 is fine. We just need Aurora as already requested.

So sorry about the false alarm!
Blocks: 917328
No longer blocks: 882145
Depends on: 928221
No longer depends on: 902003
(In reply to Al Billings [:abillings] from comment #28)
> Comment on attachment 833010 [details] [diff] [review]
> Merged patch (Uses new JS webidl weakPtr + codegen fix). r=smaug, mccr8, abr
> 
> sec-approval+ for trunk.
> 
> After it goes in there and things are green, please put on Aurora. I'm
> giving approval there too.

Thanks, does this Aurora approval extend to Bug 928535 as well, or must a separate request be written there? It is not a sec bug.
Keywords: checkin-needed
abillings gave aurora approval to the merged patch, so that implies approval for bug 928535, but that probably should be made explicit as we'll be landing that as well.

Please ask for approval on that bug, and state "so we can land dependent bugs on Aurora" for a reason; and keep the request very simple.
https://hg.mozilla.org/releases/mozilla-aurora/rev/9e046ac17cd3
(with a=bajaj for aurora as well by IRC)
Target Milestone: mozilla28 → mozilla27
https://hg.mozilla.org/mozilla-central/rev/5bd84d8e2239
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: mozilla27 → mozilla28
Yeah, I should be explicit. If I'm not, it is probably just my own oversight. Ideally, we explicitly approve each bug for clarity's sake. I see it is all taken care of though.
Using bug file, I see an immediate crash on FF28, 2013-11-13.

On today's builds of FF27 and FF28, I don't see that same crash, but I get many of these:

"[57313] ###!!! ASSERTION: You can't dereference a NULL nsRefPtr with operator*().: 'mRawPtr != 0', file /Users/mwobensmith/mozilla_central/media/webrtc/signaling/../../../xpcom/base/nsAutoPtr.h, line 1072"

On at least one occasion, it eventually crashed. 

Any idea?
Flags: needinfo?(jib)
What's the stack trace of the assert and the crash?
Flags: needinfo?(mwobensmith)
I can't seem to repro the crash, and I didn't get the stack when it did happen.

However, if you run the latest debug build of 27/28 with the bug file, you'll see the assert.
Flags: needinfo?(mwobensmith)
(In reply to Matt Wobensmith from comment #41)
> However, if you run the latest debug build of 27/28 with the bug file,
> you'll see the assert.

I see the assert and the one I'm seeing is harmless and caused by this line:

http://mxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp#1005
>   nsRefPtr<PeerConnectionObserver> tmp3 = static_cast<PeerConnectionObserver*>(&*tmp2);

To explain: It is ok for WeakPtrs to be null sometimes (it's their purpose), but in order to cast it to the right type, this line dereferences the nullptr and then takes the address of it, which works but is obviously not ideal, since it causes these NS_PRECONDITION assertions.

Thanks for spotting the assertions! I'll file myself a bug to find a way less noisy way to solve this, though they should be harmless.
Flags: needinfo?(jib)
I filed bug 944854 on the "NULL nsRefPtr" assertion.
Thanks Jan. Based on that, we can mark this bug fixed/verified.
Status: RESOLVED → VERIFIED
Group: core-security
You need to log in before you can comment on or make changes to this bug.