Closed Bug 928221 Opened 7 years ago Closed 7 years ago
RTCPeer Connection crash with GC/CC
278 bytes, text/html
5.34 KB, text/plain
258 bytes, text/html
7.38 KB, text/plain
3.16 KB, text/plain
12.38 KB, patch
|Details | Diff | Splinter Review|
1. Create a new profile with: user_pref("media.peerconnection.enabled", true); 2. Install https://www.squarefree.com/extensions/domFuzzLite3.xpi 3. Load the testcase
This testcase causes a null deref.
This testcase causes a use-after-free.
Hit MOZ_CRASH(PeerConnectionObserver not thread-safe) at ./PeerConnectionObserverBinding.cpp:2031
(In reply to Jesse Ruderman from comment #1) > Created attachment 818821 [details] > testcase 1 > > This testcase causes a null deref. I've tried this repeatedly on an inbound debug build on Linux with no problems. Will try on mac
(In reply to Jesse Ruderman from comment #2) > Created attachment 818823 [details] > stack for testcase 1 (gdb) Note this null deref is while handling IceStateChange_m()
(In reply to Jesse Ruderman from comment #4) > Created attachment 818825 [details] > stacks for testcase 2 (ASan) This is crashing in mozilla::runnable_args_m_1<nsRefPtr<sipcc::PeerConnectionImpl>, tag_nsresult (sipcc::PeerConnectionImpl::*)(mozilla::dom::PCImplIceState), mozilla::dom::PCImplIceState>::Run() The object it's trying to reference was allocated in PCObserverBinding Appears related to ICE again. Both of these may be timing-related.
(In reply to Randell Jesup [:jesup] from comment #8) > (In reply to Jesse Ruderman from comment #4) > > Created attachment 818825 [details] > > stacks for testcase 2 (ASan) > > This is crashing in > mozilla::runnable_args_m_1<nsRefPtr<sipcc::PeerConnectionImpl>, tag_nsresult > (sipcc::PeerConnectionImpl::*)(mozilla::dom::PCImplIceState), > mozilla::dom::PCImplIceState>::Run() And this is also calling PeerConnectionImpl::IceStateChange_m() (per "stack for testcase 2 (mdsw)"), so these seem to be the same failure.
I have now been able to repro this on an inbound debug build (fresh pull) on Linux, with the just-landed bug 902003 patch (and it crashed dereferencing using mPCObserver.MayGet() (PeerConnectionImpl.cpp:1634), which was added as part of the webidl conversion, so I believe the invariant given for using a raw pointer here in PeerConnectionImpl.h:505 is wrong or incomplete). Jesse: what was the pull you were using? I suspect it didn't include bug 902003 (landed on inbound at 5:13pm PDT), but bug 902003 may make this problem easier to find - or I just got lucky this time. Before the WebIDL conversion monday, mPCObserver was an nsWeakPtr. It's now a WeakReminder wrapping a raw pointer If this is indeed due to the WebIDL conversion that landed Monday, then this is regression in 27. :-) Jib, please take point on investigating this.
Assignee: nobody → jib
Depends on: 902003
I believe I know what the problem is and I'm working on a fix which I hope to have ready later today. I hope we can avoid backing stuff out as the changes are significant, other patches have landed that rely on it, and I know of several patch-queues that would be in rebase limbo. Relying on PeerconnectionImpl.close() to clear the raw pointer to the PCObserver was my mistake, as it gets called explicitly only when the page is torn down, and implicitly by ~PeerConnectionImpl. When content JS drops all references to the mozPeerConnection, nothing holds the PCObserver alive, so if you cycle-collect then the observer goes away before PeerConnectionImpl, because our runnables probably keep the latter alive longer. These runnables then hit the observer before ~PeerConnectionImpl has been called, and we get null-refs in the PeerConnectionObserverBindings. I'm working on a fix that will restore behavior similar to the weak-pointers we used to have here. I think I figured out a way to use them with concrete objects. That should bring us back to safety.
Added a new WeakConcretePtr class which uses an nsWeakPtr to guard a raw pointer, and uses that to detect observer being gone. This weak-ref part now behaves much more like things were before bug 917328, which seems safer - i.e. Compare to https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=917328&attachment=816677 I was able to reproduce, and this seems to fix it, but please re-test! Thanks.
Hmm, except now signaling_unittest crashes :-( Probably because I comment out the meat of WeakConcretePtr for the unittests, making it a regular pointer. Will have to find a way to include it.
Comment on attachment 819070 [details] [diff] [review] nsWeakPtr on guard object detects PeerConnectionObserver going away Review of attachment 819070 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.h @@ +488,1 @@ > // We hold a raw pointer to PeerConnectionObserver (no WeakRefs to concretes!) This comment block needs to be updated.
The signaling_unittest crashes appeared to be unrelated, as they happen on my system even before this patch. - Regardless, I had already gone ahead and updated the patch to use WeakPtrs in unittest too, for less difference. Also fixed some nits and renamed the class from abr's suggestion. Try from first version looks green - https://tbpl.mozilla.org/?tree=Try&rev=3c54ba413bda
- Added a backpointer in the pc._guard object to prevent pc from going away. - Updated comment.
I meant pco._guard and pco.
Attachment #819206 - Flags: review?(rjesup) → review+
Comment on attachment 819206 [details] [diff] [review] nsWeakPtr on guard object detects PeerConnectionObserver going away (3) Review of attachment 819206 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me.
Attachment #819206 - Flags: review?(adam) → review+
Try looks good - https://tbpl.mozilla.org/?tree=Try&rev=e1ffa8dc134b
Target Milestone: --- → mozilla27
I backed this out because it calls a GetWeakReferent function which doesn't exist anywhere: https://tbpl.mozilla.org/php/getParsedLog.php?id=29366148&tree=Mozilla-Inbound https://hg.mozilla.org/integration/mozilla-inbound/rev/77f70432fdcd
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #21) > I backed this out because it calls a GetWeakReferent function which doesn't > exist anywhere: Thanks, GetWeakReferent is a new webidl attribute in the patch itself and it compiles locally and worked on try in Comment 19, but I didn't alter the contract uuid of the PeerConnectionObserver interface. Maybe that's the problem? I'm trying that here. If this doesn't work, it might need a clobber due to bug 928195 again.
Same patch, just making clear in description that it has been reviewed. Carrying forward r+ from abr and jesup.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #21) > I backed this out because it calls a GetWeakReferent function which doesn't > exist anywhere: > > https://tbpl.mozilla.org/php/getParsedLog.php?id=29366148&tree=Mozilla- > Inbound > https://hg.mozilla.org/integration/mozilla-inbound/rev/77f70432fdcd
(Sorry, I had leftover crud in the comment field)
Relanded with CLOBBER in https://hg.mozilla.org/integration/mozilla-inbound/rev/cc9ce7c8ab99 build system messed up webidl dependencies again (previous push and two following were green, then one went red and ehsan backed it out).
If the CID needs to up updated when the webidl changes, then we should revise and reland. (I think I'd flagged on a review - perhaps bug 902003 - that you should check if CIDs need to be revised if the webidl changes - I don't think so, but perhaps that's wrong.)
Nightly build 20131024030204 crashes in testcase 1 see more crash reports here: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=mozilla%3A%3Adom%3A%3ACallbackObject%3A%3ACallSetup%3A%3ACallSetup%28JS%3A%3AHandle%3CJSObject*%3E%2C+mozilla%3A%3AErrorResult%26%2C+mozilla%3A%3Adom%3A%3ACallbackObject%3A%3AExceptionHandling%2C+JSCompartment*%29
Status: RESOLVED → REOPENED
OS: Mac OS X → All
Hardware: x86_64 → All
Resolution: FIXED → ---
You have to install the domFuzzLite addon. See step 2 of comment 0.
Just so we're clear here, this bug report appears to deal with two different bugs that can be evoked using somewhat similar tests. But the stacks are very different and don't seem to bear any relation to each other. The patch that we landed seems to be working for "testcase 2." But there's a different flaw here, and it's evoked by "testcase 1." Given the rather substantial differences in the stacks, I'm pretty sure these should have been filed as two different bug numbers.
I filed bug 928221 for testcase 1.
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Whiteboard: [webrtc] → [webrtc][fix for testcase 2 only]
You mean Bug 933447
:jib says this is a regression from bug 882145 as is bug 933447, which means Firefox 26 should be affected as well. The symptoms in this bug is more troubling than those Jesse saw in testcase1, and the patch is more revealing. We probably want to get this in on Beta as well. And if we take this one maybe we should take bug 933447 as well.
Even though this affects Firefox 26 on Desktop/Android, PeerConnection is disabled on b2g1.2 so we don't have to worry about that long-lived branch.
(In reply to Daniel Veditz [:dveditz] from comment #35) > :jib says this is a regression from bug 882145 as is bug 933447, which means > Firefox 26 should be affected as well. My mistake. Regressions are from Bug 917328 which landed 10-14. I confused the bugs. Everything is fine here. Sorry about the false alarm.
I can't reproduce any crash with either test case with builds of FF27 from 2013-10-27 and before. Jesse, if you can verify that your tests no longer crash, I'd appreciate it. Thanks.
I noticed an odd assertion on mozilla-central and filed bug 944854.
You need to log in before you can comment on or make changes to this bug.