ABORT lock_impl_posix from MessageLoop::PostTask_Helper when shutting down both clearkey and openh264 GMPs

NEW
Unassigned

Status

()

Core
Audio/Video: GMP
P3
normal
Rank:
25
2 years ago
a month ago

People

(Reporter: gerald, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
Created attachment 8623412 [details]
shutdown-clearkey-and-openh264.txt

After encountering some issues in bug 1173631 comment 5 and bug 1173634 comment 3, I actually was able to reproduce the crash from a fresh m-c debug build (on Mac 10.10.3).
Not sure which component this belongs to, as it happens only when using both EME Clearkey and WebRTC openh264 GMPs.

STR:
1. Launch Firefox with |NSPR_LOG_MODULES=GMP:5 ./mach run|
2. Go to http://people.mozilla.org/~cpearce/mse-clearkey/ , start video
3. Open a 2nd tab, go to https://mozilla.github.io/webrtc-landing/pc_test.html
4. Check 'Require H.264 video'
5. Click 'Start' (and optionally authorize access to camera and/or mic)
6. (Optional) Switch to 1st tab
7. Close Firefox (Cmd-Q on Mac)

Some logging near the crash site:
2113368832[10039a070]: GMPService::Observe topic='xpcom-shutdown-threads' data=''
2113368832[10039a070]: GMPService::ShutdownGMPThread
[82184] ###!!! ABORT: file ipc/chromium/src/base/lock_impl_posix.cc, line 42
[82184] WARNING: '!mMainThread', file xpcom/threads/nsThreadManager.cpp, line 299
[82184] WARNING: 'NS_FAILED(rv)', file xpcom/glue/nsThreadUtils.cpp, line 174
[82184] WARNING: 'NS_FAILED(rv)', file dom/workers/ServiceWorkerManager.cpp, line 397
#01: MessageLoop::PostTask_Helper(tracked_objects::Location const&, Task*, int, bool)[XUL +0x46d559]
#02: mozilla::ipc::MessageChannel::OnMessageReceivedFromLink(IPC::Message const&)[XUL +0x499c5e]

Attached full log, with patch from bug 1173634 as it gives a bit more information around plugin-shutdown time.

This is intermittent, I probably get it about 10-20% of the time in my tests.
And I never got it when using only one of the two pages, even multiple duplicates of the same page.
(Reporter)

Comment 1

2 years ago
Interestingly I've reproduced the same crash with just clearkey, by artificially making async-shutdown never complete:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c631beb3f3e

My gut feel at the moment is that, as the GMP doesn't really complete its shutdown, it keeps using resources (like the MessageChannel) after they are destroyed in Firefox's later stages of shutting-down.

Comment 2

2 years ago
Build ID 	20151208100201
User Agent 	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:43.0) Gecko/20100101 Firefox/43.0

Hi Gerald,

I try to reproduce this bug on FF 43 several times on Mac OS X 10.10 and I manage 1 time to make a crash. Crash signature: [@ shutdownhang | libsystem_kernel.dylib@0x16166 ]
You think that is related to other bugs that you mentioned 1173631, 1173634, I am asking this because I want to set the component.
Thank you
Crash Signature: [@ shutdownhang | libsystem_kernel.dylib@0x16166 ]
Flags: needinfo?(gsquelart)
Let's stick this in A/V:GMP for now.
Component: Untriaged → Audio/Video: GMP
(Reporter)

Comment 4

2 years ago
Agreed with A/V:GMP.

I've just tried to reproduce it, but couldn't get a crash (with or without e10s). Trunk build, Mac OS X 10.11.2.

Interestingly the last GMP shutdown-related log line is:
{clearkey:{11fc07800:"SAEHLK=No (more) async-shutdown required"},gmpopenh264:{11fc06000:"SA=Sent CloseActive, content children to close: 1"},-:{-:"012345=Async shutdown complete"}}
So even though 'gmpopenh264' still has a content child open, the overall async shutdown is considered complete. This seems wrong.

I'll come back to it full time later on.
Flags: needinfo?(gsquelart)
Not sure how to read that log line or where it comes from.  The original report's log seems to be missing (just a line of text in the attachment).

The crash @shutdownhang (comment 2) is I presume the shutdown-hang killer getting invoked.

Also: I presume that the original report's requirement of two pages is because two origins will mean two GMP processes (children) started, which may open the window for failure.  xpcom-shutdown-threads requires when you return that you've finished using threads; after that important services will be starting to tear down.  Even a response from a child process coming async after you return to the observer might cause problems, I think.

I suggest that the GMP shutdown code might benefit from using shutdown blockers from AsyncShutdown to block shutdown at the xpcom-shutdown-threads stage until all GMP children are known to be shutdown.  See bug 1166293 and bug 1237794 for examples.  We're planning to move to this for MSG as well, since the shutdown for that is a hairy mess and causes nasty intermittents.

Updated

2 years ago
Rank: 25
Priority: -- → P2
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.