Crash when Hello link-clicker join a room with gUM running in content process

RESOLVED FIXED in Firefox 45

Status

()

P1
blocker
Rank:
10
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mikedeboer, Assigned: jesup)

Tracking

Trunk
mozilla46
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(e10sm8+, firefox45 fixed, firefox46 fixed)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
(Summary prone to reformulation, I know ;) )

We're in the process of making about:loopconversation - the Hello conversation window - run in the content process to make sure that browser sharing will work when e10s is enabled by default.

However, when I apply patch https://bugzilla.mozilla.org/attachment.cgi?id=8689072 from bug 1154277, open a room and join it by myself in a different browser (instance) it crashes Fx consistently with attached crash signature when the link clicker seems almost ready to join (e.g. already past the gUM prompt).

Does the media team have the opportunity to look at this, perhaps? Please let me know if you need anything else, besides the patch mentioned above, to reproduce the crash.
(Reporter)

Comment 1

3 years ago
(I'm running a debug build, so this is the crash signature as it appeared on the command line:

Assertion failure: aManagees.Count() == 1, at ../../dist/include/mozilla/ipc/ProtocolUtils.h:344
#01: mozilla::net::PTCPSocketParent::OnMessageReceived(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x73e8b9]
#02: mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x89a32b]
#03: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f68cb]
#04: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f5442]
#05: mozilla::ipc::MessageChannel::OnMaybeDequeueOne()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4ef9a4]
#06: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be90c]
#07: MessageLoop::DoWork()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4bec4b]
#08: mozilla::ipc::DoWorkRunnable::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4fa4f5]
#09: nsThread::ProcessNextEvent(bool, bool*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe93a7]
#10: NS_ProcessPendingEvents(nsIThread*, unsigned int)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x12957f]
#11: nsBaseAppShell::NativeEventCallback()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a492e4]
#12: nsAppShell::ProcessGeckoEvents(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2ab02ce]
#13: __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x80a01]
#14: __CFRunLoopDoSources0[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x72b8d]
#15: __CFRunLoopRun[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x721bf]
#16: CFRunLoopRunSpecific[/System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation +0x71bd8]
#17: RunCurrentEventLoopInMode[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x3256f]
#18: ReceiveNextEventCommon[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x322ea]
#19: _BlockUntilNextEventMatchingListInModeWithFilter[/System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox +0x3212b]
#20: _DPSNextEvent[/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x918ab]
#21: -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x90e58]
#22: -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:][/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2aaf5e6]
#23: -[NSApplication run][/System/Library/Frameworks/AppKit.framework/Versions/C/AppKit +0x86af3]
#24: nsAppShell::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2ab09f5]
#25: nsAppStartup::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3612599]
#26: XREMain::XRE_mainRun()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3686ae0]
#27: XREMain::XRE_main(int, char**, nsXREAppData const*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3687368]
#28: XRE_main[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x36878dd]
#29: main[/Users/mikedeboer/Projects/fx-team/./obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/firefox-bin +0x214f]
[Child 4856] WARNING: didn't get an id (ipc/rcv) timed out 10004003
: file /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm, line 619
[Child 4856] ###!!! ABORT: corrupted actor state: file ./PCameras.cpp, line 34
[Child 4856] ###!!! ABORT: Aborting on channel error.: file /Users/mikedeboer/Projects/fx-team/ipc/glue/MessageChannel.cpp, line 1765
./run: line 2:  4851 Segmentation fault: 11  ./obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/firefox-bin -p dev -jsconsole
#01: mozilla::media::LambdaRunnable<mozilla::camera::CamerasChild::Shutdown()::$_8>::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x23ce30d]
#01: mozilla::ipc::ProcessLink::OnChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f99a7]
#02: nsThread::ProcessNextEvent(bool, bool*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe93a7]
#02: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4d0ea9]
#03: NS_ProcessNextEvent(nsIThread*, bool)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x129673]
[Child 4856] WARNING: didn't get an id (ipc/rcv) timed out 10004003
: file /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm, line 619
[Child 4856] WARNING: sending port failed (ipc/send) invalid destination port 10000003
: file /Users/mikedeboer/Projects/fx-team/ipc/glue/SharedMemoryBasic_mach.mm, line 613
Assertion failure: mDestroyed, at /Users/mikedeboer/Projects/fx-team/gfx/layers/IPDLActor.h:42
#03: event_base_loop[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4da689]
#04: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4fa9e3]
#01: mozilla::layers::CompositableClient::DestroyIPDLActor(mozilla::layers::PCompositableChild*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xde4e8f]
#04: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c0095]
#05: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be0dc]
#02: mozilla::layers::PImageBridgeChild::DeallocSubtree()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5d9e9b]
#05: MessageLoop::Run()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be0dc]
mikedeboer:~/Projects/fx-team$ #06: nsThread::ThreadFunc(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0xe7339]
#03: mozilla::layers::PImageBridgeChild::OnChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x5da26b]
#06: base::Thread::ThreadMain()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c873d]
#07: _pt_root[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x33bee9]
#04: mozilla::ipc::MessageChannel::NotifyMaybeChannelError()[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4f7664]
#07: ThreadFunc(void*)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4c884a]
#08: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
#05: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&)[/Users/mikedeboer/Projects/fx-team/obj-x86_64-apple-darwin14.5.0/dist/NightlyDebug.app/Contents/MacOS/XUL +0x4be90c]
#08: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x405a]
#09: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3fd7]
[Child 4856] ###!!! ABORT: corrupted actor state: file ./PCameras.cpp, line 34
Hit MOZ_CRASH() at /Users/mikedeboer/Projects/fx-team/memory/mozalloc/mozalloc_abort.cpp:33
(Reporter)

Comment 2

3 years ago
Maire, this crash is blocking Hello to run properly in e10s mode. Is it possible for someone to take a look at this issue relatively soon?

Thanks!
Flags: needinfo?(mreavy)
(Assignee)

Comment 3

3 years ago
Created attachment 8690085 [details]
crash

Moved crashdump to an attachment
(Assignee)

Updated

3 years ago
Crash Signature: mozilla::dom::workers::WorkerPrivate::WaitForWorkerEvents(unsigned int) + 122 4 XUL 0x000000010451a9d9 mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) + 377 5 XUL 0x00000001044ce549 …
Looking at it now.  I'll find the right owner today.
Rank: 10
tracking-e10s: --- → ?
Flags: needinfo?(mreavy)
(Assignee)

Comment 5

3 years ago
Mike - there are two dumps here; one that was in the Crash Signature (moved to attachment), which shows a MainThread nullptr crash in TCPSocketParent::RecvOpenBind (i.e. TCP TURN in use or being tried), and one above in IPC (Assertion failure: aManagees.Count() == 1).  These might be related, but it would help to know exactly where each comes from.

I'll spin a mac build with that change
Flags: needinfo?(mdeboer)
(Reporter)

Comment 6

3 years ago
Ah yes, I should've explained! The one in the crash signature field came from the OSX crash reporter dialog (native one) and the one in comment 1 is what I got on the command line.
Flags: needinfo?(mdeboer)
Assignee: nobody → rjesup
tracking-e10s: ? → m8+
(Reporter)

Comment 7

3 years ago
Were you able to reproduce this issue, Randell, or do you need more info from me?
Status: NEW → ASSIGNED
Flags: needinfo?(rjesup)
(Assignee)

Comment 8

3 years ago
(In reply to Mike de Boer [:mikedeboer] from comment #7)
> Were you able to reproduce this issue, Randell, or do you need more info
> from me?

I have been able to repro, and have it caught in a debugger.
Flags: needinfo?(rjesup)

Updated

3 years ago
Blocks: 1154273
(Reporter)

Comment 9

3 years ago
\o/ ;-)
(Assignee)

Comment 10

3 years ago
Created attachment 8694358 [details] [diff] [review]
Don't assume a TCPSocket has only one managee
Attachment #8694358 - Flags: review?(wmccloskey)
(Assignee)

Updated

3 years ago
Blocks: 1154277
Josh, my only concern here is with the other places where we use LoneManagedOrNull. This regressed here:
https://hg.mozilla.org/mozilla-central/rev/19c5de45a704

I wonder if we should fix all the TCP socket callsites? It sounds like Hello is the only desktop user of TCPSocket, and it maybe only cares about this one callsite. But we might have others down the road.
Flags: needinfo?(josh)
Yes, we absolutely should. I imagine this can bite b2g as well in the right situations.
Flags: needinfo?(josh)
Comment on attachment 8694358 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Review of attachment 8694358 [details] [diff] [review]:
-----------------------------------------------------------------

Randell, can you fix the rest of these? Maybe just add a helper function somewhere that doesn't assert. It could even be in ProtocolUtils.h.
Attachment #8694358 - Flags: review?(wmccloskey)
(Reporter)

Comment 14

3 years ago
Randell, when I apply the patch(es) and start a conversation with browser sharing, the crash is indeed gone :-)

However, the screen share stream is also terminated, so effectively instead of crashing it now stops sharing when the remote party joins the call.
Flags: needinfo?(rjesup)
(In reply to Mike de Boer [:mikedeboer] from comment #14)
> However, the screen share stream is also terminated, so effectively instead
> of crashing it now stops sharing when the remote party joins the call.

Just a thought. There's been some bugs reported by QA along these sorts of lines - where screen share doesn't show up when someone else joins a room. Might be worth trying commenting out the auto-screen share and hooking it into a button manually for a quick verification.
(Assignee)

Comment 16

3 years ago
Created attachment 8700807 [details] [diff] [review]
don't block sharing in e10s NOT FOR CHECKIN

Mike: With this patch to stop it killing sharing in e10s mode, this patch queue works for me (I can share tabs in e10s mode no problem).  Without this patch, I see what I think you were seeing (sharing stops ev5Cen before the remote connects, and an error in the Browser Console.  Note: this just hacks the code out of there; you'll need to do whatever else might be needed to cleanly remove the Block-sharing-in-e10s stuff
(Assignee)

Comment 17

3 years ago
patchset that works for me is:

unbitrotted bug 1154277 patch (https://bugzilla.mozilla.org/attachment.cgi?id=8694250) -- note, I unbitrotted it, but you should re-do it or check my unbitrot carefully.

The patch from this bug (8694358)

The don't-block-sharing patch (8700807)
Flags: needinfo?(standard8)
Flags: needinfo?(rjesup)
Flags: needinfo?(mdeboer)
(Assignee)

Comment 18

3 years ago
FYI, with these patches, on ending calls in debug builds I'm seeing 

Assertion failure: outputTrack->GetEnd() == GraphTimeToStreamTimeWithBlocking(interval.mStart) (Samples missing), at /home/jesup/src/mozilla/inbound/dom/media/TrackUnionStream.cpp:268

That would be a separate issue in MSG I believe.  Not sure why it would be happening in Loop...
Flags: needinfo?(roc)
Flags: needinfo?(padenot)
(Assignee)

Comment 19

3 years ago
Created attachment 8700866 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

as requested, made SingleManagedOrNull() that doesn't assert, and renamed LongManagedOrNull to LoneManagedOrNullAsserts, and converted all the TCPSocket uses to SingleManagedOrNull()
Attachment #8700866 - Flags: review?(wmccloskey)
(Assignee)

Updated

3 years ago
Attachment #8694358 - Attachment is obsolete: true
Comment on attachment 8700866 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Review of attachment 8700866 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/network/TCPSocketParent.cpp
@@ +224,5 @@
>    bool     inBrowser = false;
>    const PContentParent *content = Manager()->Manager();
> +  // We don't use LoneManagedOrNullAsserts(content->ManagedPBrowserParent())
> +  // here, because with Hello-in-Content causes 2 managees
> +  // (Bug 1226200 and bug 1154277)

I don't think this comment adds much value.
Attachment #8700866 - Flags: review?(wmccloskey) → review+

Comment 22

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/16b693debb2b
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox46: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
I believe Mike has tested it and he's currently happy with the fix.
Flags: needinfo?(standard8)
Flags: needinfo?(mdeboer)
(In reply to Randell Jesup [:jesup] from comment #18)
> FYI, with these patches, on ending calls in debug builds I'm seeing 
> 
> Assertion failure: outputTrack->GetEnd() ==
> GraphTimeToStreamTimeWithBlocking(interval.mStart) (Samples missing), at
> /home/jesup/src/mozilla/inbound/dom/media/TrackUnionStream.cpp:268
> 
> That would be a separate issue in MSG I believe.  Not sure why it would be
> happening in Loop...

It's better to file this one separately I think.
Flags: needinfo?(padenot)
(Assignee)

Comment 25

3 years ago
Let's see what happens when Mike really finishes his patch
(Reporter)

Comment 26

3 years ago
Randell, can you request uplift for this patch to Fx 45?
Flags: needinfo?(rjesup)
(Assignee)

Comment 27

3 years ago
Comment on attachment 8700866 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Approval Request Comment
[Feature/regressing bug #]: N/A

[User impact if declined]: Inability to sandbox Hello in the content process; also inability to share camera/mic with other in-content applications.

[Describe test coverage new/current, TreeHerder]: TCP TURN isn't testable in automation.

[Risks and why]: Low risk.  Per BillM, the assumption here that there was a single manager is only really true for B2G, so asserting it isn't good.  The function (SingleManagedOrNull) is identical to LoneManagedOrNullAsserts, except for the assertion.
[String/UUID change made/needed]:
Flags: needinfo?(rjesup)
Attachment #8700866 - Flags: approval-mozilla-aurora?
Comment on attachment 8700866 [details] [diff] [review]
Don't assume a TCPSocket has only one managee

Fix a crash, taking it.
Attachment #8700866 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 29

3 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/75038d8f9703
status-firefox45: affected → fixed
You need to log in before you can comment on or make changes to this bug.