Closed Bug 928221 Opened 11 years ago Closed 11 years ago

mozRTCPeerConnection crash with GC/CC

Categories

(Core :: WebRTC, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox25 --- unaffected
firefox26 --- unaffected
firefox27 + fixed
firefox-esr17 --- unaffected
firefox-esr24 --- unaffected
b2g18 --- unaffected
b2g-v1.2 --- unaffected

People

(Reporter: jruderman, Assigned: jib)

References

Details

(7 keywords, Whiteboard: [webrtc][fix for testcase 2 only])

Crash Data

Attachments

(6 files, 4 obsolete files)

1. Create a new profile with:
     user_pref("media.peerconnection.enabled", true);
2. Install https://www.squarefree.com/extensions/domFuzzLite3.xpi
3. Load the testcase
Attached file testcase 1
This testcase causes a null deref.
Attached file testcase 2
This testcase causes a use-after-free.
Hit MOZ_CRASH(PeerConnectionObserver not thread-safe) at ./PeerConnectionObserverBinding.cpp:2031
(In reply to Jesse Ruderman from comment #1)
> Created attachment 818821 [details]
> testcase 1
> 
> This testcase causes a null deref.

I've tried this repeatedly on an inbound debug build on Linux with no problems.  Will try on mac
(In reply to Jesse Ruderman from comment #2)
> Created attachment 818823 [details]
> stack for testcase 1 (gdb)

Note this null deref is while handling IceStateChange_m()
(In reply to Jesse Ruderman from comment #4)
> Created attachment 818825 [details]
> stacks for testcase 2 (ASan)

This is crashing in mozilla::runnable_args_m_1<nsRefPtr<sipcc::PeerConnectionImpl>, tag_nsresult (sipcc::PeerConnectionImpl::*)(mozilla::dom::PCImplIceState), mozilla::dom::PCImplIceState>::Run()

The object it's trying to reference was allocated in PCObserverBinding

Appears related to ICE again.  Both of these may be timing-related.
(In reply to Randell Jesup [:jesup] from comment #8)
> (In reply to Jesse Ruderman from comment #4)
> > Created attachment 818825 [details]
> > stacks for testcase 2 (ASan)
> 
> This is crashing in
> mozilla::runnable_args_m_1<nsRefPtr<sipcc::PeerConnectionImpl>, tag_nsresult
> (sipcc::PeerConnectionImpl::*)(mozilla::dom::PCImplIceState),
> mozilla::dom::PCImplIceState>::Run()

And this is also calling PeerConnectionImpl::IceStateChange_m()  (per  "stack for testcase 2 (mdsw)"), so these seem to be the same failure.
Whiteboard: [webrtc]
I have now been able to repro this on an inbound debug build (fresh pull) on Linux, with the just-landed bug 902003 patch (and it crashed dereferencing using mPCObserver.MayGet() (PeerConnectionImpl.cpp:1634), which was added as part of the webidl conversion, so I believe the invariant given for using a raw pointer here in PeerConnectionImpl.h:505 is wrong or incomplete).

Jesse: what was the pull you were using?  I suspect it didn't include bug 902003 (landed on inbound at 5:13pm PDT), but bug 902003 may make this problem easier to find - or I just got lucky this time.

Before the WebIDL conversion monday, mPCObserver was an nsWeakPtr.  It's now a WeakReminder wrapping a raw pointer

If this is indeed due to the WebIDL conversion that landed Monday, then this is regression in 27.  :-)

Jib, please take point on investigating this.
Assignee: nobody → jib
Depends on: 902003
Flags: needinfo?(jruderman)
I believe I know what the problem is and I'm working on a fix which I hope to have ready later today. I hope we can avoid backing stuff out as the changes are significant, other patches have landed that rely on it, and I know of several patch-queues that would be in rebase limbo.

Relying on PeerconnectionImpl.close() to clear the raw pointer to the PCObserver was my mistake, as it gets called explicitly only when the page is torn down, and implicitly by ~PeerConnectionImpl.

When content JS drops all references to the mozPeerConnection, nothing holds the PCObserver alive, so if you cycle-collect then the observer goes away before PeerConnectionImpl, because our runnables probably keep the latter alive longer. These runnables then hit the observer before ~PeerConnectionImpl has been called, and we  get null-refs in the PeerConnectionObserverBindings.

I'm working on a fix that will restore behavior similar to the weak-pointers we used to have here. I think I figured out a way to use them with concrete objects. That should bring us back to safety.
Added a new WeakConcretePtr class which uses an nsWeakPtr to guard a raw pointer, and uses that to detect observer being gone.

This weak-ref part now behaves much more like things were before bug 917328, which seems safer - i.e. Compare to https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=917328&attachment=816677

I was able to reproduce, and this seems to fix it, but please re-test! Thanks.
Attachment #819070 - Flags: review?(rjesup)
Hmm, except now signaling_unittest crashes :-( Probably because I comment out the meat of WeakConcretePtr for the unittests, making it a regular pointer. Will have to find a way to include it.
Comment on attachment 819070 [details] [diff] [review]
nsWeakPtr on guard object detects PeerConnectionObserver going away

Review of attachment 819070 [details] [diff] [review]:
-----------------------------------------------------------------

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.h
@@ +488,1 @@
>    // We hold a raw pointer to PeerConnectionObserver (no WeakRefs to concretes!)

This comment block needs to be updated.
The signaling_unittest crashes appeared to be unrelated, as they happen on my system even before this patch. - Regardless, I had already gone ahead and updated the patch to use WeakPtrs in unittest too, for less difference.

Also fixed some nits and renamed the class from abr's suggestion.

Try from first version looks green - https://tbpl.mozilla.org/?tree=Try&rev=3c54ba413bda
Attachment #819070 - Attachment is obsolete: true
Attachment #819070 - Flags: review?(rjesup)
Attachment #819169 - Flags: review?(rjesup)
Attachment #819169 - Flags: review?(adam)
- Added a backpointer in the pc._guard object to prevent pc from going away.
- Updated comment.
Attachment #819169 - Attachment is obsolete: true
Attachment #819169 - Flags: review?(rjesup)
Attachment #819169 - Flags: review?(adam)
Attachment #819206 - Flags: review?(rjesup)
Attachment #819206 - Flags: review?(adam)
I meant pco._guard and pco.
Attachment #819206 - Flags: review?(rjesup) → review+
Comment on attachment 819206 [details] [diff] [review]
nsWeakPtr on guard object detects PeerConnectionObserver going away (3)

Review of attachment 819206 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #819206 - Flags: review?(adam) → review+
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #21)
> I backed this out because it calls a GetWeakReferent function which doesn't
> exist anywhere:

Thanks, GetWeakReferent is a new webidl attribute in the patch itself and it compiles locally and worked on try in Comment 19, but I didn't alter the contract uuid of the PeerConnectionObserver interface. Maybe that's the problem? I'm trying that here.

If this doesn't work, it might need a clobber due to bug 928195 again.
Attachment #819206 - Attachment is obsolete: true
Attachment #819390 - Flags: review+
Same patch, just making clear in description that it has been reviewed.
Carrying forward r+ from abr and jesup.
Attachment #819390 - Attachment is obsolete: true
Attachment #819391 - Flags: review+
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #21)
> I backed this out because it calls a GetWeakReferent function which doesn't
> exist anywhere:
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=29366148&tree=Mozilla-
> Inbound
> https://hg.mozilla.org/integration/mozilla-inbound/rev/77f70432fdcd
(Sorry, I had leftover crud in the comment field)
Relanded with CLOBBER in https://hg.mozilla.org/integration/mozilla-inbound/rev/cc9ce7c8ab99

build system messed up webidl dependencies again (previous push and two following were green, then one went red and ehsan backed it out).
If the CID needs to up updated when the webidl changes, then we should revise and reland. 

(I think I'd flagged on a review - perhaps bug 902003 - that you should check if CIDs need to be revised if the webidl changes - I don't think so, but perhaps that's wrong.)
https://hg.mozilla.org/mozilla-central/rev/dc2b71e57211
Status: NEW → RESOLVED
Closed: 11 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Keywords: verifyme
jesse,

I tried to run testcase 2 under ASan and I get:

JavaScript error: https://bug928221.bugzilla.mozilla.org/attachment.cgi?id=818821&t=xAWnbBNhMv, line 20: fuzzPriv is not defined

Can you advise how to run?
You have to install the domFuzzLite addon.  See step 2 of comment 0.
Just so we're clear here, this bug report appears to deal with two different bugs that can be evoked using somewhat similar tests. But the stacks are very different and don't seem to bear any relation to each other.

The patch that we landed seems to be working for "testcase 2." But there's a different flaw here, and it's evoked by "testcase 1."

Given the rather substantial differences in the stacks, I'm pretty sure these should have been filed as two different bug numbers.
I filed bug 928221 for testcase 1.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Whiteboard: [webrtc] → [webrtc][fix for testcase 2 only]
Flags: needinfo?(jruderman)
:jib says this is a regression from bug 882145 as is bug 933447, which means Firefox 26 should be affected as well. The symptoms in this bug is more troubling than those Jesse saw in testcase1, and the patch is more revealing. We probably want to get this in on Beta as well. And if we take this one maybe we should take bug 933447 as well.
Even though this affects Firefox 26 on Desktop/Android, PeerConnection is disabled on b2g1.2 so we don't have to worry about that long-lived branch.
(In reply to Daniel Veditz [:dveditz] from comment #35)
> :jib says this is a regression from bug 882145 as is bug 933447, which means
> Firefox 26 should be affected as well.

My mistake. Regressions are from Bug 917328 which landed 10-14. I confused the bugs.

Everything is fine here. Sorry about the false alarm.
Blocks: 917328
No longer blocks: 882145
No longer depends on: 902003
Blocks: 933447
I can't reproduce any crash with either test case with builds of FF27 from 2013-10-27 and before.

Jesse, if you can verify that your tests no longer crash, I'd appreciate it. Thanks.
I noticed an odd assertion on mozilla-central and filed bug 944854.
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: