Closed Bug 1041525 Opened 6 years ago Closed 5 years ago

Possible deadlock in Crashreporter::OOPInit() from GMP plugin initialization

Categories

(Core :: WebRTC, defect)

defect
Not set
Points:
8

Tracking

()

VERIFIED FIXED
mozilla34
Iteration:
34.1
Tracking Status
firefox32 --- unaffected
firefox33 + verified
firefox34 --- verified

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 5 obsolete files)

Main thread stack: https://pastebin.mozilla.org/5586355
GMP thread stack: https://pastebin.mozilla.org/5586330

> <jesup> gfritzsche: looks like it's a deadlock: OOPInit SyncRunnable's to MainThread, and MainThread is SyncRunnableing to GMPThread
> <jesup> cpearce had a patch to proxy SyncLaunch to MainThread, but bsmedberg didn't want to do that, and updated things to keep SyncLaunch on GMPThread. Forget the bug number
Flags: firefox-backlog+
Blocks: 1038961, OpenH264
Whiteboard: [leave-open][webrtc-uplift]
tested on Linux m-c opt --enable-crashreporter build that failed before in a possible SyncRunnable-to-mainthread deadlock
Attachment #8459583 - Flags: review?(georg.fritzsche)
Attachment #8459583 - Attachment is obsolete: true
Attachment #8459583 - Flags: review?(georg.fritzsche)
Attachment #8459587 - Flags: review?(georg.fritzsche)
Attachment #8459587 - Flags: review?(georg.fritzsche) → review+
Attachment #8459587 - Attachment is obsolete: true
Attachment #8459592 - Attachment is obsolete: true
Attachment #8459601 - Flags: review?(georg.fritzsche)
Attachment #8459601 - Flags: review?(georg.fritzsche) → review+
Iteration: --- → 33.3
Status: NEW → ASSIGNED
QA Whiteboard: [qa?]
QA Whiteboard: [qa?] → [qa-]
Comment on attachment 8459601 [details] [diff] [review]
Disable GMP OOPInit() during SyncLaunch() until crashreporting fully hooked up

NOTE: request for Aurora/33, not 32, under the assumption this patch doesn't get included in the uplift (which seems likely)

Approval Request Comment
[Feature/regressing bug #]: GMP code as a whole (exact bug not relevant)

[User impact if declined]: OpenH264 will not work (crash/lockup) in opt builds with CrashReporter turned on

[Describe test coverage new/current, TBPL]: Dummy OpenH264 plugin and test in-tree; only runnable by hand until we get some help from the build peers this week.  (Bug 1037125 and follow-ons)  Note that this only fails in crashreporter-enabled builds!

[Risks and why]: minimal risk; specific disabling to the GMP plugin, where there are a few small missing pieces in getting GMP crash reporting up. (see bug 1041226, which may be related)

[String/UUID change made/needed]: none
Attachment #8459601 - Flags: approval-mozilla-aurora?
(In reply to Wes Kocher (:KWierso) from comment #8)
> https://hg.mozilla.org/mozilla-central/rev/fa68d4a514cd

This didn't make the cut for 33, for the record.
Iteration: 33.3 → 34.1
Attached patch some api proposals (obsolete) — Splinter Review
Some unfinished WIP for API changes
Deadlock are a pain. Tracking it.
Attachment #8459601 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8459601 - Flags: checkin?
Comment on attachment 8459601 [details] [diff] [review]
Disable GMP OOPInit() during SyncLaunch() until crashreporting fully hooked up

https://hg.mozilla.org/releases/mozilla-aurora/rev/e263af5237ff
Attachment #8459601 - Flags: checkin? → checkin+
I can't reproduce this and the stacks in pastebin are gone already.
I'm not sure if we have any useful next steps for now.
Maybe jesup still has the stacks somewhere?
Flags: needinfo?(rjesup)
IIRC, new profile; download, go to http://mozilla.github.com/webrtc-landing/pc_test.html, click h264, start.

Needed to be an opt build with --enable-crashreporter.
Flags: needinfo?(rjesup)
Whiteboard: [leave-open][webrtc-uplift] → [leave-open][openh264-uplift]
Thanks, i tried with that today.
I can try some variations tomorrow and speculative reading of the code.

If that doesn't work i propose we re-enable this later tomorrow together with bug 1041226 fixed. Then we either see it's gone or should get crash-/hangreports.
(This is given the time-frame here we're working with for re-enabling)
adalucinet got a hang with an Aurora build from before workaround landed and produced a crash report with proper stacks:
bp-51371aef-307c-4c20-b72a-829752140723
QA Contact: drno
Reliably reproduced this on OS X & m-c rev 9383ccd71289 with --disable-debug, --enable-optimize, --enable-crashreporter.

The problem this ran into is two SyncRunnables blocking each other:
* main thread: http://hg.mozilla.org/mozilla-central/annotate/06ac51c2b8a8/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#l131
* gmp thread: http://hg.mozilla.org/mozilla-central/annotate/06ac51c2b8a8/toolkit/crashreporter/nsExceptionHandler.cpp#l2557

Suggested simple and - per spot-checking - working fix: for the main-thread dispatch, use mGMPThread->Dispatch(..., NS_DISPATCH_SYNC) so it keeps spinning the event loop.
Attached patch Dispatch using NS_DISPATCH_SYNC (obsolete) — Splinter Review
Comment on attachment 8459998 [details] [diff] [review]
some api proposals

This wasn't meant for this bug.
Attachment #8459998 - Attachment is obsolete: true
Blocks: 1042348
Comment on attachment 8461627 [details] [diff] [review]
Dispatch using NS_DISPATCH_SYNC

I can explain the related bug 1042348 now and local testing looks good.
Attachment #8461627 - Flags: review?(rjesup)
Forgot putting the re-enabling part up as well.
Attachment #8461627 - Attachment is obsolete: true
Attachment #8461627 - Flags: review?(rjesup)
Attachment #8461643 - Flags: review?(rjesup)
Blocks: 1043531
Comment on attachment 8461643 [details] [diff] [review]
Dispatch using NS_DISPATCH_SYNC

Review of attachment 8461643 [details] [diff] [review]:
-----------------------------------------------------------------

r+ to land just the GeckoChildProcessHost.cpp part (I landed fixes for the Dispatch on another bug already)
Attachment #8461643 - Flags: review?(rjesup) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/2ee16696f61e
Whiteboard: [leave-open][openh264-uplift] → [openh264-uplift]
https://hg.mozilla.org/mozilla-central/rev/2ee16696f61e
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla33 → mozilla34
Comment on attachment 8461643 [details] [diff] [review]
Dispatch using NS_DISPATCH_SYNC

Approval Request Comment
[Feature/regressing bug #]: N/A

[User impact if declined]: No GMP crashreporter in Aurora/33

[Describe test coverage new/current, TBPL]: Marked test for GMP plugins for uplift; this gets called on every plug invocation

[Risks and why]: Failures if bug 1041232 isn't landed on Aurora first

[String/UUID change made/needed]: None
Attachment #8461643 - Flags: approval-mozilla-aurora?
NOTE: it appears that we need to land bug 1041232 and this bug on Aurora to stop Aurora (especially Mac) crashes
Attachment #8461643 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Duplicate of this bug: 1044249
Nils, could you please verify this is fixed? It's probably sufficient to verify via testing that bug 1044249 no longer reproduces.
Flags: needinfo?(drno)
As we did not have exact procedure to reproduce the crashes I placed lots of manual H264 calls on pc_tes.html with FF 33 and 34 on Mac and Windows and I was not able to reproduce any crash any more.
Thanks for your help, Nils.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.