Closed Bug 1226200 Opened 4 years ago Closed 4 years ago

Crash when Hello link-clicker join a room with gUM running in content process

Categories

(Core :: WebRTC, defect, P1, blocker)

defect

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
e10s m8+ ---
firefox45 --- fixed
firefox46 --- fixed

People

(Reporter: mikedeboer, Assigned: jesup)

References

Details

Attachments

(3 files, 1 obsolete file)

(Summary prone to reformulation, I know ;) )

We're in the process of making about:loopconversation - the Hello conversation window - run in the content process to make sure that browser sharing will work when e10s is enabled by default.

However, when I apply patch https://bugzilla.mozilla.org/attachment.cgi?id=8689072 from bug 1154277, open a room and join it by myself in a different browser (instance) it crashes Fx consistently with attached crash signature when the link clicker seems almost ready to join (e.g. already past the gUM prompt).

Does the media team have the opportunity to look at this, perhaps? Please let me know if you need anything else, besides the patch mentioned above, to reproduce the crash.
(I'm running a debug build, so this is the crash signature as it appeared on the command line:

Assertion failure: aManagees.Count() == 1, at ../../dist/include/mozilla/ipc/ProtocolUtils.h:344
#01: mozilla::net::PTCPSocketParent::OnMessageReceived(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x73e8b9]
#02: mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x89a32b]
#03: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f68cb]
#04: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f5442]
#05: mozilla::ipc::MessageChannel::OnMaybeDequeueOne()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4ef9a4]
#06: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be90c]
#07: MessageLoop::DoWork()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4bec4b]
#08: mozilla::ipc::DoWorkRunnable::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4fa4f5]
#09: nsThread::ProcessNextEvent(bool, bool*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe93a7]
#10: NS_ProcessPendingEvents(nsIThread*, unsigned int)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x12957f]
#11: nsBaseAppShell::NativeEventCallback()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a492e4]
#12: nsAppShell::ProcessGeckoEvents(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2ab02ce]
#13: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x80a01]
#14: __CFRunLoopDoSources0[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x72b8d]
#15: __CFRunLoopRun[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x721bf]
#16: CFRunLoopRunSpecific[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x71bd8]
#17: RunCurrentEventLoopInMode[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x3256f]
#18: ReceiveNextEventCommon[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x322ea]
#19: _BlockUntilNextEventMatchingListInModeWithFilter[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x3212b]
#20: _DPSNextEvent[/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x918ab]
#21: -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x90e58]
#22: -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:][/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2aaf5e6]
#23: -[NSApplication run][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x86af3]
#24: nsAppShell::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2ab09f5]
#25: nsAppStartup::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3612599]
#26: XREMain::XRE_mainRun()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3686ae0]
#27: XREMain::XRE_main(int, char**, nsXREAppData const*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3687368]
#28: XRE_main[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x36878dd]
#29: main[/Users/mikedeboer/Projects/fx-team/./obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/firefox-bin +0x214f]
[Child 4856] WARNING: didn't get an id (ipc/rcv) timed out 10004003
: file /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm, line 619
[Child 4856] ###!!! ABORT: corrupted actor state: file ./PCameras.cpp, line 34
[Child 4856] ###!!! ABORT: Aborting on channel error.: file /Users/mikedeboer/Projects/fx-team/ipc/glue/MessageChannel.cpp, line 1765
./run: line 2:  4851 Segmentation fault: 11  ./obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/firefox-bin -p dev -jsconsole
#01: mozilla::media::LambdaRunnable<mozilla::camera::CamerasChild::Shutdown()::$_8>::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x23ce30d]
#01: mozilla::ipc::ProcessLink::OnChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f99a7]
#02: nsThread::ProcessNextEvent(bool, bool*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe93a7]
#02: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4d0ea9]
#03: NS_ProcessNextEvent(nsIThread*, bool)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x129673]
[Child 4856] WARNING: didn't get an id (ipc/rcv) timed out 10004003
: file /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm, line 619
[Child 4856] WARNING: sending port failed (ipc/send) invalid destination port 10000003
: file /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm, line 613
Assertion failure: mDestroyed, at /Users/mikedeboer/Projects/fx-team/gfx/layers/IPDLActor.h:42
#03: event_base_loop[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4da689]
#04: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4fa9e3]
#01: mozilla::layers::CompositableClient::DestroyIPDLActor(mozilla::layers::PCompositableChild*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xde4e8f]
#04: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c0095]
#05: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be0dc]
#02: mozilla::layers::PImageBridgeChild::DeallocSubtree()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5d9e9b]
#05: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be0dc]
mikedeboer:~/Projects/fx-team$ #06: nsThread::ThreadFunc(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe7339]
#03: mozilla::layers::PImageBridgeChild::OnChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5da26b]
#06: base::Thread::ThreadMain()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c873d]
#07: _pt_root[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x33bee9]
#04: mozilla::ipc::MessageChannel::NotifyMaybeChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f7664]
#07: ThreadFunc(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c884a]
#08: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
#05: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be90c]
#08: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
#09: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
[Child 4856] ###!!! ABORT: corrupted actor state: file ./PCameras.cpp, line 34
Hit MOZ_CRASH() at /Users/mikedeboer/Projects/fx-team/memory/mozalloc/mozalloc_abort.cpp:33
Maire, this crash is blocking Hello to run properly in e10s mode. Is it possible for someone to take a look at this issue relatively soon?

Thanks!
Flags: needinfo?(mreavy)
Attached file crash
Moved crashdump to an attachment
Crash Signature: Process: firefox-bin [4851] Path: /Users/USER/*/NightlyDebug.app/Contents/MacOS/firefox-bin Identifier: org.mozilla.nightlydebug Version: 45.0a1 (4515.11.19) Code Type: X86-64 (Native…
Looking at it now.  I'll find the right owner today.
Rank: 10
tracking-e10s: --- → ?
Flags: needinfo?(mreavy)
Mike - there are two dumps here; one that was in the Crash Signature (moved to attachment), which shows a MainThread nullptr crash in TCPSocketParent::RecvOpenBind (i.e. TCP TURN in use or being tried), and one above in IPC (Assertion failure: aManagees.Count() == 1).  These might be related, but it would help to know exactly where each comes from.

I'll spin a mac build with that change
Flags: needinfo?(mdeboer)
Ah yes, I should've explained! The one in the crash signature field came from the OSX crash reporter dialog (native one) and the one in comment 1 is what I got on the command line.
Flags: needinfo?(mdeboer)
Assignee: nobody → rjesup
Were you able to reproduce this issue, Randell, or do you need more info from me?
Status: NEW → ASSIGNED
Flags: needinfo?(rjesup)
(In reply to Mike de Boer [:mikedeboer] from comment #7)
> Were you able to reproduce this issue, Randell, or do you need more info
> from me?

I have been able to repro, and have it caught in a debugger.
Flags: needinfo?(rjesup)
Blocks: loop-e10s
\o/ ;-)
Attachment #8694358 - Flags: review?(wmccloskey)
Blocks: 1154277
Josh, my only concern here is with the other places where we use LoneManagedOrNull. This regressed here:
https://hg.mozilla.org/mozilla-central/rev/19c5de45a704

I wonder if we should fix all the TCP socket callsites? It sounds like Hello is the only desktop user of TCPSocket, and it maybe only cares about this one callsite. But we might have others down the road.
Flags: needinfo?(josh)
Yes, we absolutely should. I imagine this can bite b2g as well in the right situations.
Flags: needinfo?(josh)
Comment on attachment 8694358 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Review of attachment 8694358 [details] [diff] [review]:
-----------------------------------------------------------------

Randell, can you fix the rest of these? Maybe just add a helper function somewhere that doesn't assert. It could even be in ProtocolUtils.h.
Attachment #8694358 - Flags: review?(wmccloskey)
Randell, when I apply the patch(es) and start a conversation with browser sharing, the crash is indeed gone :-)

However, the screen share stream is also terminated, so effectively instead of crashing it now stops sharing when the remote party joins the call.
Flags: needinfo?(rjesup)
(In reply to Mike de Boer [:mikedeboer] from comment #14)
> However, the screen share stream is also terminated, so effectively instead
> of crashing it now stops sharing when the remote party joins the call.

Just a thought. There's been some bugs reported by QA along these sorts of lines - where screen share doesn't show up when someone else joins a room. Might be worth trying commenting out the auto-screen share and hooking it into a button manually for a quick verification.
Mike: With this patch to stop it killing sharing in e10s mode, this patch queue works for me (I can share tabs in e10s mode no problem).  Without this patch, I see what I think you were seeing (sharing stops ev5Cen before the remote connects, and an error in the Browser Console.  Note: this just hacks the code out of there; you'll need to do whatever else might be needed to cleanly remove the Block-sharing-in-e10s stuff
patchset that works for me is:

unbitrotted bug 1154277 patch (https://bugzilla.mozilla.org/attachment.cgi?id=8694250) -- note, I unbitrotted it, but you should re-do it or check my unbitrot carefully.

The patch from this bug (8694358)

The don't-block-sharing patch (8700807)
Flags: needinfo?(standard8)
Flags: needinfo?(rjesup)
Flags: needinfo?(mdeboer)
FYI, with these patches, on ending calls in debug builds I'm seeing 

Assertion failure: outputTrack->GetEnd() == GraphTimeToStreamTimeWithBlocking(interval.mStart) (Samples missing), at /home/jesup/src/mozilla/inbound/dom/media/TrackUnionStream.cpp:268

That would be a separate issue in MSG I believe.  Not sure why it would be happening in Loop...
Flags: needinfo?(roc)
Flags: needinfo?(padenot)
as requested, made SingleManagedOrNull() that doesn't assert, and renamed LongManagedOrNull to LoneManagedOrNullAsserts, and converted all the TCPSocket uses to SingleManagedOrNull()
Attachment #8700866 - Flags: review?(wmccloskey)
Attachment #8694358 - Attachment is obsolete: true
Comment on attachment 8700866 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Review of attachment 8700866 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/network/TCPSocketParent.cpp
@@ +224,5 @@
>    bool     inBrowser = false;
>    const PContentParent *content = Manager()->Manager();
> +  // We don't use LoneManagedOrNullAsserts(content->ManagedPBrowserParent())
> +  // here, because with Hello-in-Content causes 2 managees
> +  // (Bug 1226200 and bug 1154277)

I don't think this comment adds much value.
Attachment #8700866 - Flags: review?(wmccloskey) → review+
https://hg.mozilla.org/mozilla-central/rev/16b693debb2b
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
I believe Mike has tested it and he's currently happy with the fix.
Flags: needinfo?(standard8)
Flags: needinfo?(mdeboer)
(In reply to Randell Jesup [:jesup] from comment #18)
> FYI, with these patches, on ending calls in debug builds I'm seeing 
> 
> Assertion failure: outputTrack->GetEnd() ==
> GraphTimeToStreamTimeWithBlocking(interval.mStart) (Samples missing), at
> /home/jesup/src/mozilla/inbound/dom/media/TrackUnionStream.cpp:268
> 
> That would be a separate issue in MSG I believe.  Not sure why it would be
> happening in Loop...

It's better to file this one separately I think.
Flags: needinfo?(padenot)
Let's see what happens when Mike really finishes his patch
Randell, can you request uplift for this patch to Fx 45?
Flags: needinfo?(rjesup)
Comment on attachment 8700866 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Approval Request Comment
[Feature/regressing bug #]: N/A

[User impact if declined]: Inability to sandbox Hello in the content process; also inability to share camera/mic with other in-content applications.

[Describe test coverage new/current, TreeHerder]: TCP TURN isn't testable in automation.

[Risks and why]: Low risk.  Per BillM, the assumption here that there was a single manager is only really true for B2G, so asserting it isn't good.  The function (SingleManagedOrNull) is identical to LoneManagedOrNullAsserts, except for the assertion.
[String/UUID change made/needed]:
Flags: needinfo?(rjesup)
Attachment #8700866 - Flags: approval-mozilla-aurora?
Comment on attachment 8700866 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Fix a crash, taking it.
Attachment #8700866 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.