Closed Bug 1217373 Opened 4 years ago Closed 4 years ago

[Presentation WebAPI] Avoid B2G crash due to potential excessive releases in PresentationSessionTransport

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

(blocking-b2g:2.5+, firefox45 fixed, b2g-v2.5 fixed)

RESOLVED FIXED
FxOS-S11 (13Nov)
blocking-b2g 2.5+
Tracking Status
firefox45 --- fixed
b2g-v2.5 --- fixed

People

(Reporter: selin, Assigned: selin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ft:conndevices][partner-blocker])

Attachments

(2 files, 1 obsolete file)

Attached file backtrace.txt
When it comes to establish a new presentation connection to a receiving device (Nexus) while an existing one is still connected, somehow the B2G process on the receiving device crashes due to the following cause:

1. |PresentationSessionTransport| has some attributes [1] which may hold references to |PresentationSessionTransport| as listeners and thus result in reference cycles.

2. When one of these objects (for example, A) is about to release, it calls the destructor of the |PresentationSessionTransport| object. Then the |PresentationSessionTransport| object follows the reference cycle and calls A's destructor again. That eventually results in an attempt to destroy a lock which has already been destroyed. [2]

[1] https://dxr.mozilla.org/mozilla-central/source/dom/presentation/PresentationSessionTransport.h#86-97
[2] https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/Mutex.h#60
Whiteboard: [ft:conndevices]
Attached patch Patch, v1 (obsolete) — Splinter Review
Update |PresentationSessionTransport|'s destructor to break potential reference cycles.
Assignee: nobody → selin
Status: NEW → ASSIGNED
Attachment #8677406 - Flags: review?(josh)
Is PresentationSessionTransport's destructor being called manually? I don't understand how this can happen otherwise, if it's participating in a reference cycle.
(In reply to Josh Matthews [:jdm] from comment #2)
> Is PresentationSessionTransport's destructor being called manually? I don't
> understand how this can happen otherwise, if it's participating in a
> reference cycle.

|NS_IMPL_RELEASE| macro appears to trigger the destructor [1].

For example, |PresentationSessionTransport| object has a member |mInputStreamPump| [2], which also hold the |PresentationSessionTransport| object as its listener. And when |nsInputStreamPump::OnStateStop| [3] is called (according to the callstack [4]), somehow these objects, possibly also involved with some other members in [2], probably indirectly reference to the same lock. So b2g crashes when the lock is about to be destroyed at the second time (according to the attached backtrace.txt) in a series of nested object release calls.

So the patch is to avoid the nested destroying calls by using local smart pointers to temporarily hold these objects and then unset the member variables in |PresentationSessionTransport|'s destructor.

[1] https://dxr.mozilla.org/mozilla-central/source/xpcom/glue/nsISupportsImpl.h#642
[2] https://dxr.mozilla.org/mozilla-central/source/dom/presentation/PresentationSessionTransport.h#86-97
[3] https://dxr.mozilla.org/mozilla-central/source/netwerk/base/nsInputStreamPump.cpp#672
[4] The callstack:
#0  nsAsyncStreamCopier::~nsAsyncStreamCopier (this=this@entry=0xa51f1f10, __in_chrg=<optimized out>) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsAsyncStreamCopier.cpp:78
#1  0xb27d2dc2 in nsAsyncStreamCopier::Release (this=0xa51f1f10) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsAsyncStreamCopier.cpp:140
#2  0xb2724212 in nsCOMPtr<nsIStackFrame>::~nsCOMPtr (this=0xa51f4500, __in_chrg=<optimized out>) at ../../../../dist/include/nsCOMPtr.h:405
#3  0xb390c184 in mozilla::dom::PresentationSessionTransport::~PresentationSessionTransport (this=0xa51f44c0, __in_chrg=<optimized out>)
    at /home/rexboy/B2G-nexus/gecko/dom/presentation/PresentationSessionTransport.cpp:79
#4  0xb390c24a in mozilla::dom::PresentationSessionTransport::Release (this=0xa51f44c0) at /home/rexboy/B2G-nexus/gecko/dom/presentation/PresentationSessionTransport.cpp:64
#5  0xb2724284 in nsCOMPtr<nsIStackFrame>::assign_assuming_AddRef (this=0xa97bc41c, aNewPtr=<optimized out>) at ../../../../dist/include/nsCOMPtr.h:380
#6  0xb27e6e0c in assign_with_AddRef (aRawPtr=0x0, this=0xa97bc41c) at ../../dist/include/nsCOMPtr.h:1066
#7  operator= (aRhs=0x0, this=0xa97bc41c) at ../../dist/include/nsCOMPtr.h:578
#8  nsInputStreamPump::OnStateStop (this=this@entry=0xa97bc400) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsInputStreamPump.cpp:720
#9  0xb27ea356 in nsInputStreamPump::OnInputStreamReady (this=0xa97bc400, stream=<optimized out>) at /home/rexboy/B2G-nexus/gecko/netwerk/base/nsInputStreamPump.cpp:436
#10 0xb276d8c8 in nsInputStreamReadyEvent::Run (this=0xa4bff920) at /home/rexboy/B2G-nexus/gecko/xpcom/io/nsStreamUtils.cpp:91
#11 0xb277e870 in nsThread::ProcessNextEvent (this=0xb606e420, aMayWait=<optimized out>, aResult=0xbee365e7) at /home/rexboy/B2G-nexus/gecko/xpcom/threads/nsThread.cpp:972
#12 0xb279ad46 in NS_ProcessNextEvent (aThread=0xb606e420, aMayWait=aMayWait@entry=false) at /home/rexboy/B2G-nexus/gecko/xpcom/glue/nsThreadUtils.cpp:297
#13 0xb297ebb8 in mozilla::ipc::MessagePump::Run (this=0xb6038650, aDelegate=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/glue/MessagePump.cpp:95
#14 0xb2963dd8 in MessageLoop::RunInternal (this=this@entry=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/chromium/src/base/message_loop.cc:234
#15 0xb2963df2 in RunHandler (this=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/chromium/src/base/message_loop.cc:227
#16 MessageLoop::Run (this=0xb6071280) at /home/rexboy/B2G-nexus/gecko/ipc/chromium/src/base/message_loop.cc:201
#17 0xb392b66a in nsBaseAppShell::Run (this=0xae85f9a0) at /home/rexboy/B2G-nexus/gecko/widget/nsBaseAppShell.cpp:156
#18 0xb3e0674c in nsAppStartup::Run (this=0xae883100) at /home/rexboy/B2G-nexus/gecko/toolkit/components/startup/nsAppStartup.cpp:281
#19 0xb3e3b1b8 in XREMain::XRE_mainRun (this=this@entry=0xbee367a0) at ../../../gecko/toolkit/xre/nsAppRunner.cpp:4298
#20 0xb3e3b5a2 in XREMain::XRE_main (this=this@entry=0xbee367a0, argc=argc@entry=1, argv=argv@entry=0xb6003320, aAppData=aAppData@entry=0xb6f76b88 <_ZL8sAppData>)
    at ../../../gecko/toolkit/xre/nsAppRunner.cpp:4391
#21 0xb3e3b786 in XRE_main (argc=1, argv=0xb6003320, aAppData=0xb6f76b88 <_ZL8sAppData>, aFlags=<optimized out>) at ../../../gecko/toolkit/xre/nsAppRunner.cpp:4493
#22 0xb6f57f26 in do_main (argc=argc@entry=1, argv=argv@entry=0xb6003320) at ../../../gecko/b2g/app/nsBrowserApp.cpp:167
#23 0xb6f58098 in b2g_main (argc=1, argv=<optimized out>) at ../../../gecko/b2g/app/nsBrowserApp.cpp:299
#24 0xb6f57da6 in RunProcesses (aReservedFds=..., argv=0xbee37a84, argc=1) at ../../../gecko/b2g/app/B2GLoader.cpp:232
#25 main (argc=1, argv=0xbee37a84) at ../../../gecko/b2g/app/B2GLoader.cpp:297
Comment on attachment 8677406 [details] [diff] [review]
Patch, v1

This is not the right fix, and I think it only works by accident. Based on the backtrace from the crash, we're crashing when proxying the final release of PresentationSessionTransport to the main thread (due to the use of SetEventSink with a non-null target in PresentationSessionTransport::InitWithChannelDescription). As seen in the callstack in the previous comment, this is a problem because we've already run the destructor during that OnStreamStop call. I still don't understand why this is occurring, however - nsTransportEventSinkProxy (which is created in SetEventSink) correctly holds a reference to the sink (PresentationSessionTransport, in this case), so there's no reason that PresentationSessionTransport's destructor should be called before the nsAsyncStreamCopier's destructor.

Are there any non-smart pointer reference counting operations being performed on PresentationSessionTransport (ie. ptr->AddRef()/ptr->Release() or NS_ADDREF(ptr)/NS_RELEASE(ptr))? The parts I've looked and you've explained don't add to me. There are definitely memory cycles, but those should be causing leaks instead of crashes. The correct way to fix the cycles is to make PresentationSessionTransport cycle-collected, instead of trying to manually break cycles in the destructor (see http://mxr.mozilla.org/mozilla-central/source/dom/network/TCPSocket.cpp?force=1#100 for how TCPSocket does it).
Attachment #8677406 - Flags: review?(josh) → review-
Whiteboard: [ft:conndevices] → [ft:conndevices][partner-blocker]
Here is the STR for this issue:

0. Fennec with presentation API video sharing patch and Firefox OS debug build + TV Gaia
1. initiate video sharing from Fennec to Firefox OS
2. reload web page on Fennec
3. initiate video sharing again
Attached patch Patch, v2Splinter Review
Make PresentationSessionTransport cycle collectable.
Attachment #8677406 - Attachment is obsolete: true
Attachment #8680515 - Flags: review?(josh)
Comment on attachment 8680515 [details] [diff] [review]
Patch, v2

Review of attachment 8680515 [details] [diff] [review]:
-----------------------------------------------------------------

Does this fix the crash?
Attachment #8680515 - Flags: review?(josh) → review+
(In reply to Josh Matthews [:jdm] from comment #7)
> Comment on attachment 8680515 [details] [diff] [review]
> Patch, v2
> 
> Review of attachment 8680515 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Does this fix the crash?

Yeah. We've verified that the crash no longer happens on the real device where the issue was observed.
https://hg.mozilla.org/mozilla-central/rev/169c7abf2a23
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → FxOS-S11 (13Nov)
blocking-b2g: --- → 2.5+
removing the b2g 2.5 flag since this commit has been reverted due to an incorrect merge, sorry for the confusion
You need to log in before you can comment on or make changes to this bug.