Closed Bug 1282577 Opened 8 years ago Closed 8 years ago

GMP crash in mozalloc_abort | NS_DebugBreak | mozilla::gmp::PGMPVideoDecoderChild::Write

Categories

(Core :: Audio/Video: Playback, defect, P1)

x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- affected
firefox48 --- fixed
firefox49 --- fixed
firefox50 --- fixed

People

(Reporter: cpeterson, Assigned: cpearce)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

@ cpearce: this looks like a new crash signature that started climbing last week and only appears to affect Firefox 47 clients. Perhaps this is related to Firefox 47 users running e10s for the first time? This bug was filed from the Socorro interface and is report bp-bd9d80f7-0c87-4481-91df-465512160621. ============================================================= 0 mozglue.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp:33 1 xul.dll NS_DebugBreak xpcom/base/nsDebugImpl.cpp:434 2 xul.dll mozilla::gmp::PGMPVideoDecoderChild::Write(mozilla::gmp::PGMPVideoDecoderChild*, IPC::Message*, bool) obj-firefox/ipc/ipdl/PGMPVideoDecoderChild.cpp:880 3 xul.dll mozilla::gmp::PGMPVideoDecoderChild::Send__delete__(mozilla::gmp::PGMPVideoDecoderChild*) obj-firefox/ipc/ipdl/PGMPVideoDecoderChild.cpp:52 4 xul.dll mozilla::gmp::GMPVideoDecoderChild::RecvDecodingComplete() dom/media/gmp/GMPVideoDecoderChild.cpp:211 5 xul.dll MessageLoop::DoWork() ipc/chromium/src/base/message_loop.cc:459 6 xul.dll base::MessagePumpDefault::Run(base::MessagePump::Delegate*) ipc/chromium/src/base/message_pump_default.cc:34 7 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:227 8 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:201 9 xul.dll XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:637 10 plugin-container.exe content_process_main(int, char** const) ipc/contentproc/plugin-container.cpp:237 11 plugin-container.exe wmain toolkit/xre/nsWindowsWMain.cpp:131 12 plugin-container.exe __tmainCRTStartup f:/dd/vctools/crt/crtw32/startup/crt0.c:255 13 kernel32.dll BaseThreadInitThunk 14 ntdll.dll __RtlUserThreadStart 15 ntdll.dll _RtlUserThreadStart
Flags: needinfo?(cpearce)
I built the same revision and looked through the generated IPDL code. This crash appears to be a runtime abort because the PGMPVideoDecoder child actor is receiving a DecodingComplete message after it's already been shutdown.
Assignee: nobody → cpearce
Flags: needinfo?(cpearce)
We must be hitting the #ifndef SHMEM_ALLOC_IN_CHILD block in GMPVideoDecoderChild::Alloc() that causes us to dispatch a task to call GMPVideoDecoderChild::RecvDecodingComplete(): https://hg.mozilla.org/releases/mozilla-release/annotate/7f5abf95991b/dom/media/gmp/GMPVideoDecoderChild.cpp#l229 That was added in bug 1162358. We must be getting multiple allocs doing intr calls at once. I think that could happen in the WidevineVideoDecoder if a Decode message comes in, and goes into a ReturnOutput(), tries to alloc a frame and has to spin on an intr message response, and another Decode message comes in and does the same, so GMPVideoDecoderChild::mNeedShmemIntrCount will be 2, and then a DecodingComplete comes in, and when the events on the stack in GMPVideoDecoderChild::Alloc() finish they both end up dispatching a task to re-call GMPVideoDecoderChild::RecvDecodingComplete(). intr, or spinning event loops in general for that matter, is evil.
This is a regression from bug 1162358. We must be hitting the #ifndef SHMEM_ALLOC_IN_CHILD block in GMPVideoDecoderChild::Alloc() with multiple allocs doing intr calls at once. If this happens when a DecodingComplete() comes in, we'll end up sending one task to re-call RecvDecodingComplete for every Alloc() blocked on an intr response. This would result in us ending up trying to Send__delete__() in RecvDecodingComplete() twice. Which causes the runtime abort we're seeing here. I think that could happen in the WidevineVideoDecoder if a Decode message comes in, and goes into a ReturnOutput(), tries to alloc a frame and has to spin on an intr message response, and another Decode message comes in and does the same, so GMPVideoDecoderChild::mNeedShmemIntrCount will be 2, and then a DecodingComplete comes in, and when two tasks on the stack in GMPVideoDecoderChild::Alloc() finish they both end up dispatching a task to re-call GMPVideoDecoderChild::RecvDecodingComplete(). So we end up trying to Send__delete__() in RecvDecodingComplete() twice. I expect the same problem exists in the GMPVideoEncoder too. intr, or spinning event loops in general for that matter, is evil. Review commit: https://reviewboard.mozilla.org/r/63442/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/63442/
Attachment #8769617 - Flags: review?(rjesup)
Attachment #8769617 - Flags: review?(rjesup) → review+
Comment on attachment 8769617 [details] Bug 1282577 - Guard against multiple intr messages causes us to multi-delete GMP actors. . https://reviewboard.mozilla.org/r/63442/#review60432
Pushed by cpearce@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2bfcec45187a Guard against multiple intr messages causes us to multi-delete GMP actors. r=jesup.
Comment on attachment 8769617 [details] Bug 1282577 - Guard against multiple intr messages causes us to multi-delete GMP actors. . Approval Request Comment [Feature/regressing bug #]: Widevine EME support [User impact if declined]: The CDM process will crash, interrupting EME viewing pleasure. Firefox content/main process won't crash, just the CDM process. This will affect multiple large EME video sites, and is a significant cause of errors on those sites. [Describe test coverage new/current, TreeHerder]: We have lots of EME mochitests. [Risks and why]: Should be fairly low; the patch is quite simple. [String/UUID change made/needed]: None.
Attachment #8769617 - Flags: approval-mozilla-beta?
Attachment #8769617 - Flags: approval-mozilla-aurora?
Note: I think the patch here also fixes bug 1274963, which is the MacOSX variant of this CDM crash.
Also looks like this fixes bug 1173632 which affects Primetime EME and we've been trying to solve for ages. :)
Blocks: 1173632
Hi Chris, From the crash report, I don't see any crashes for 48, do you think we need to uplift this patch to 48 beta?
Flags: needinfo?(cpearce)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
(In reply to Gerry Chang [:gchang] from comment #9) > Hi Chris, > From the crash report, I don't see any crashes for 48, do you think we need > to uplift this patch to 48 beta? Yes, I think we should uplift to 48 beta. There are crashes here: https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-06-01&signature=mozalloc_abort%20%7C%20NS_DebugBreak%20%7C%20mozilla%3A%3Agmp%3A%3APGMPVideoDecoderChild%3A%3AWrite&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-version&_sort=-date&page=1 This is the signature search with an extra search filter of "date >= 2016-06-01". I think we're seeing more crashes in 47 and not as many in 48, as Netfix are testing Widevine in 47 and not in Firefox 48. So Widevine isn't being used as much in 48, so it's not failing as much. I'd guess the other Firefox versions hitting this are mostly Amazon Prime Video users.
Flags: needinfo?(cpearce)
Comment on attachment 8769617 [details] Bug 1282577 - Guard against multiple intr messages causes us to multi-delete GMP actors. . Looks like this will fix some Widevine and other issues. Please uplift to aurora and beta
Attachment #8769617 - Flags: approval-mozilla-beta?
Attachment #8769617 - Flags: approval-mozilla-beta+
Attachment #8769617 - Flags: approval-mozilla-aurora?
Attachment #8769617 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: