Closed Bug 1332109 Opened 7 years ago Closed 7 years ago

Sandboxed GMPs can't use a postmortem debugger (even with MOZ_CRASHREPORTER_DISABLE)

Categories

(Core :: Security: Process Sandboxing, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: cpearce, Assigned: bryce)

References

(Blocks 1 open bug)

Details

If you set the MOZ_CRASHREPORTER_DISABLE=1 the crash reporter still handles crashes in the GMP process, and (on Windows) suppresses Visual Studio's JIT debugger dialog from prompting to attach a debugger.

We want a mechanism to attach Visual Studio to the GMP process when it crashes so that GMP/CDM vendors can debug their plugins when they crash.

Currently the Widevine CDM is crashing intermittently when we load 
http://www.w3c-test.org/encrypted-media/drm-mp4-playback-temporary-multisession.html and Widevine are struggling to attach the debugger in a timely fashion in order to debug this.
Bryce, when we spoke about this bug, I had been operating under the assumption that Breakpad was still installing itself despite the environment variable. I did some poking and it seems I was wrong. With MOZ_CRASHREPORTER_DISABLE=1, when the CDM crashes, we indeed don't go through Breakpad. (Try pressing Send Crash Report and then check out about:crashes -- nothing.) So the environment variable is actually kinda working!

So the question becomes more generally, why don't we get a postmortem debugger for the CDM process, and the answer to that is probably more tied to our multi-process/IPC setup than it is to Breakpad, so I'm not going to be of much help there.

In the meantime, is it feasible for the CDM engineers to try to catch the crash with gflags and/or "windbg -o"?
dmajor, thanks for that! I'm going to spend some more time familiarising myself with windbg to verify what I'm seeing, but I think it's quite feasible to use 'windbg -o' to catch the CDM crash. I've been having a look and see an access violation taking place during GetCdmVersion:

> Access violation - code c0000005

> 00 01d1fb4c 0f48a66f widevinecdm!GetCdmVersion+0xf149b
> 01 01d1fb94 0f496153 widevinecdm!GetCdmVersion+0x11651f
> 02 01d1fbb0 0f494c94 widevinecdm!GetCdmVersion+0x122003
> 03 01d1fbc4 0f49cddf widevinecdm!GetCdmVersion+0x120b44
> 04 01d1fbe4 0f4af15a widevinecdm!GetCdmVersion+0x128c8f
> 05 01d1fbfc 0f470d52 widevinecdm!InitializeCdmModule_4+0x4dda
> 06 01d1fc3c 0f471152 widevinecdm!GetCdmVersion+0xfcc02
> 07 01d1fcb8 0f46ef25 widevinecdm!GetCdmVersion+0xfd002
> 08 01d1fce4 0f5525f2 widevinecdm!GetCdmVersion+0xfadd5
> 09 01d1fd1c 0f55271a widevinecdm!InitializeCdmModule_4+0xa8272
> 0a 01d1fd28 761a62c4 widevinecdm!InitializeCdmModule_4+0xa839a
> 0b 01d1fd3c 77080fd9 KERNEL32!BaseThreadInitThunk+0x24
> 0c 01d1fd84 77080fa4 ntdll!__RtlUserThreadStart+0x2f
> 0d 01d1fd94 00000000 ntdll!_RtlUserThreadStart+0x1b
TL;DR this can be worked around with MOZ_DISABLE_GMP_SANDBOX=1.

I stepped through ntdll!RtlDispatchException and everything generally looked OK until late in KERNELBASE!UnhandledExceptionFilter, when the wizardry became too deep for me to follow. I did notice a reference to JOB_OBJECT_LIMIT_DIE_ON_UNHANDLED_EXCEPTION, which alerted me to the sandbox code, but simply clearing that flag wasn't enough -- it gave me an ugly process-termination dialog but still no postmortem debugger. (Note: I had to lie in KERNELBASE!BasepIsDebugPortPresent to reach some of these paths)

We could try to narrow this down further to the minimal set of sandbox restrictions that's causing the issue, but it's not clear that it's a priority now that we know that MOZ_DISABLE_GMP_SANDBOX=1 works. Shall we call it good enough?
Component: Breakpad Integration → Security: Process Sandboxing
Product: Toolkit → Core
Summary: MOZ_CRASHREPORTER_DISABLE doesn't disable GMP crash reporting → Sandboxed GMPs can't use a postmortem debugger (even with MOZ_CRASHREPORTER_DISABLE)
I set MOZ_CRASHREPORTER_DISABLE=1 and MOZ_DISABLE_GMP_SANDBOX=1 and when the CDM crashes I don't get prompted to attach a post-mortem debugger for Win64 or Win32 Nightly. Is there something else I need to set?
With just MOZ_DISABLE_GMP_SANDBOX=1 and a local debug build (debug opt) I am able to get a JIT dialog for the plugin container. However, when running a downloaded copy of nightly or disable-debug local build with MOZ_DISABLE_GMP_SANDBOX=1 and MOZ_CRASHREPORTER_DISABLE=1 I don't get the JIT dialog.

So my current thinking is that the debug-ness of a build also impacts this in some way. I'm going to see if I can locate what's going on here, but time box it.
Off the top of my mind, the only thing that debug-ness should affect is whether the crash reporter is on or off by default: https://dxr.mozilla.org/mozilla-central/rev/6dccae211ae5fec6a1c1244b878ce0b93860154f/toolkit/crashreporter/nsExceptionHandler.cpp#1587-1601

I had been testing with Developer Edition. I'll see if there's something different about nightly...
The latest Win64 Nightly popped me into WinDbg:

C:\Program Files\Nightly>set MOZ_CRASHREPORTER_DISABLE=1
C:\Program Files\Nightly>set MOZ_DISABLE_GMP_SANDBOX=1
C:\Program Files\Nightly>firefox.exe -no-remote -profile d:\fooprofile http://www.w3c-test.org/encrypted-media/drm-mp4-playback-temporary-multisession.html
[wait for widevine then reload a few times]
dmajor looked at my machine and determined that VS wasn't actually registered properly to catch post-mortem debugging. Re-registering it while running with elevated privileges makes it work. So closing bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
We may want to keep this around to record the fact that postmortem debugging doesn't work unless you disable the sandbox. I could see it becoming a problem if a developer needed to debug an issue that was _caused_ by the sandbox. (Though I'm not sure there's really anything we can do about it, other than disabling sandbox features with finer granularity.)
(In reply to Bryce Van Dyk (:SingingTree) from comment #5)
> So my current thinking is that the debug-ness of a build also impacts this
> in some way. I'm going to see if I can locate what's going on here, but time
> box it.

FYI the crashreporter is disabled by default in debug builds:

https://dxr.mozilla.org/mozilla-central/rev/fbdfcecf0c774d2221f11aed5a504d5591c774e0/toolkit/crashreporter/nsExceptionHandler.cpp#1598

This means we don't even install the breakpad exception handler unless you set the MOZ_CRASHREPORTER environment variable before launching Firefox.
You need to log in before you can comment on or make changes to this bug.