Closed Bug 1282577 Opened 5 years ago Closed 5 years ago

GMP crash in mozalloc_abort | NS_DebugBreak | mozilla::gmp::PGMPVideoDecoderChild::Write

Categories

(Core :: Audio/Video: Playback, defect, P1)

x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- affected
firefox48 --- fixed
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: cpeterson, Assigned: cpearce)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

@ cpearce: this looks like a new crash signature that started climbing last week and only appears to affect Firefox 47 clients. Perhaps this is related to Firefox 47 users running e10s for the first time?


This bug was filed from the Socorro interface and is 
report bp-bd9d80f7-0c87-4481-91df-465512160621.
=============================================================

 0 	mozglue.dll 	mozalloc_abort(char const* const) 	memory/mozalloc/mozalloc_abort.cpp:33
1 	xul.dll 	NS_DebugBreak 	xpcom/base/nsDebugImpl.cpp:434
2 	xul.dll 	mozilla::gmp::PGMPVideoDecoderChild::Write(mozilla::gmp::PGMPVideoDecoderChild*, IPC::Message*, bool) 	obj-firefox/ipc/ipdl/PGMPVideoDecoderChild.cpp:880
3 	xul.dll 	mozilla::gmp::PGMPVideoDecoderChild::Send__delete__(mozilla::gmp::PGMPVideoDecoderChild*) 	obj-firefox/ipc/ipdl/PGMPVideoDecoderChild.cpp:52
4 	xul.dll 	mozilla::gmp::GMPVideoDecoderChild::RecvDecodingComplete() 	dom/media/gmp/GMPVideoDecoderChild.cpp:211
5 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc:459
6 	xul.dll 	base::MessagePumpDefault::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_default.cc:34
7 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:227
8 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:201
9 	xul.dll 	XRE_InitChildProcess 	toolkit/xre/nsEmbedFunctions.cpp:637
10 	plugin-container.exe 	content_process_main(int, char** const) 	ipc/contentproc/plugin-container.cpp:237
11 	plugin-container.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:131
12 	plugin-container.exe 	__tmainCRTStartup 	f:/dd/vctools/crt/crtw32/startup/crt0.c:255
13 	kernel32.dll 	BaseThreadInitThunk 	
14 	ntdll.dll 	__RtlUserThreadStart 	
15 	ntdll.dll 	_RtlUserThreadStart
Flags: needinfo?(cpearce)
I built the same revision and looked through the generated IPDL code. This crash appears to be a runtime abort because the PGMPVideoDecoder child actor is receiving a DecodingComplete message after it's already been shutdown.
Assignee: nobody → cpearce
Flags: needinfo?(cpearce)
We must be hitting the #ifndef SHMEM_ALLOC_IN_CHILD block in GMPVideoDecoderChild::Alloc() that causes us to dispatch a task to call GMPVideoDecoderChild::RecvDecodingComplete():

https://hg.mozilla.org/releases/mozilla-release/annotate/7f5abf95991b/dom/media/gmp/GMPVideoDecoderChild.cpp#l229

That was added in bug 1162358.

We must be getting multiple allocs doing intr calls at once. I think that could happen in the WidevineVideoDecoder if a Decode message comes in, and goes into a ReturnOutput(), tries to alloc a frame and has to spin on an intr message response, and another Decode message comes in and does the same, so GMPVideoDecoderChild::mNeedShmemIntrCount will be 2, and then a DecodingComplete comes in, and when the events on the stack in GMPVideoDecoderChild::Alloc() finish they both end up dispatching a task to re-call GMPVideoDecoderChild::RecvDecodingComplete().

intr, or spinning event loops in general for that matter, is evil.
This is a regression from bug 1162358.

We must be hitting the #ifndef SHMEM_ALLOC_IN_CHILD block in
GMPVideoDecoderChild::Alloc() with multiple allocs doing intr calls at once.
If this happens when a DecodingComplete() comes in, we'll end up sending one
task to re-call RecvDecodingComplete for every Alloc() blocked on an intr
response.  This would result in us ending up trying to Send__delete__() in
RecvDecodingComplete() twice.  Which causes the runtime abort we're seeing
here.

I think that could happen in the WidevineVideoDecoder if a Decode message comes
in, and goes into a ReturnOutput(), tries to alloc a frame and has to spin on
an intr message response, and another Decode message comes in and does the
same, so GMPVideoDecoderChild::mNeedShmemIntrCount will be 2, and then a
DecodingComplete comes in, and when two tasks on the stack in
GMPVideoDecoderChild::Alloc() finish they both end up dispatching a task to
re-call GMPVideoDecoderChild::RecvDecodingComplete(). So we end up trying to
Send__delete__() in RecvDecodingComplete() twice.

I expect the same problem exists in the GMPVideoEncoder too.

intr, or spinning event loops in general for that matter, is evil.

Review commit: https://reviewboard.mozilla.org/r/63442/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/63442/
Attachment #8769617 - Flags: review?(rjesup)
Attachment #8769617 - Flags: review?(rjesup) → review+
Comment on attachment 8769617 [details]
Bug 1282577 - Guard against multiple intr messages causes us to multi-delete GMP actors. .

https://reviewboard.mozilla.org/r/63442/#review60432
Pushed by cpearce@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2bfcec45187a
Guard against multiple intr messages causes us to multi-delete GMP actors. r=jesup.
Comment on attachment 8769617 [details]
Bug 1282577 - Guard against multiple intr messages causes us to multi-delete GMP actors. .

Approval Request Comment
[Feature/regressing bug #]: Widevine EME support
[User impact if declined]: The CDM process will crash, interrupting EME viewing pleasure. Firefox content/main process won't crash, just the CDM process. This will affect multiple large EME video sites, and is a significant cause of errors on those sites.
[Describe test coverage new/current, TreeHerder]: We have lots of EME mochitests.
[Risks and why]: Should be fairly low; the patch is quite simple.
[String/UUID change made/needed]: None.
Attachment #8769617 - Flags: approval-mozilla-beta?
Attachment #8769617 - Flags: approval-mozilla-aurora?
Note: I think the patch here also fixes bug 1274963, which is the MacOSX variant of this CDM crash.
Also looks like this fixes bug 1173632 which affects Primetime EME and we've been trying to solve for ages. :)
Blocks: 1173632
Hi Chris,
From the crash report, I don't see any crashes for 48, do you think we need to uplift this patch to 48 beta?
Flags: needinfo?(cpearce)
https://hg.mozilla.org/mozilla-central/rev/2bfcec45187a
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
(In reply to Gerry Chang [:gchang] from comment #9)
> Hi Chris,
> From the crash report, I don't see any crashes for 48, do you think we need
> to uplift this patch to 48 beta?

Yes, I think we should uplift to 48 beta. 

There are crashes here:
https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-06-01&signature=mozalloc_abort%20%7C%20NS_DebugBreak%20%7C%20mozilla%3A%3Agmp%3A%3APGMPVideoDecoderChild%3A%3AWrite&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-version&_sort=-date&page=1

This is the signature search with an extra search filter of "date >= 2016-06-01".

I think we're seeing more crashes in 47 and not as many in 48, as Netfix are testing Widevine in 47 and not in Firefox 48. So Widevine isn't being used as much in 48, so it's not failing as much.

I'd guess the other Firefox versions hitting this are mostly Amazon Prime Video users.
Flags: needinfo?(cpearce)
Comment on attachment 8769617 [details]
Bug 1282577 - Guard against multiple intr messages causes us to multi-delete GMP actors. .

Looks like this will fix some Widevine and other issues. Please uplift to aurora and beta
Attachment #8769617 - Flags: approval-mozilla-beta?
Attachment #8769617 - Flags: approval-mozilla-beta+
Attachment #8769617 - Flags: approval-mozilla-aurora?
Attachment #8769617 - Flags: approval-mozilla-aurora+
Duplicate of this bug: 1274963
You need to log in before you can comment on or make changes to this bug.