Allow driver crash detection at media/webgl callsites

RESOLVED FIXED

Status

()

Core
Graphics
RESOLVED FIXED
3 years ago
3 months ago

People

(Reporter: dvander, Assigned: dvander)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(10 attachments, 3 obsolete attachments)

4.63 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
10.53 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
5.68 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
9.20 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
5.08 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
6.35 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
3.35 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
25.23 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
6.85 KB, patch
jgilbert
: review+
mattwoodrow
: review+
Details | Diff | Splinter Review
861 bytes, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
The whole DriverInitCrashDetection thing is kinda hardcoded for layers acceleration. This bug is for pulling it apart and using it for media and WebGL device creation.
Blocks: 1174189
No longer blocks: 1183305
Created attachment 8642708 [details] [diff] [review]
part 1, remove unneeded static var

Splitting this into reviewable chunks. Goal is to move d3d11/layers stuff out of the crash detection logic and into its own class.

First up, this static var just isn't needed, it's covered by the status pref.
Attachment #8642708 - Flags: review?(matt.woodrow)
Created attachment 8642709 [details] [diff] [review]
part 2, rename to DriverCrashGuard
Attachment #8642709 - Flags: review?(matt.woodrow)
Created attachment 8642710 [details] [diff] [review]
part 3, make initialization lazy

This is because I want to derive DriverCrashGuard into subclasses, but then we can't call virtual functions from the constructor. Easier to just split the constructor out.
Attachment #8642710 - Flags: review?(matt.woodrow)
Created attachment 8642711 [details] [diff] [review]
part 4, split out layers logic

This introduces a D3D11LayersCrashGuard class and moves a bunch of the environment logic into that.
Attachment #8642711 - Flags: review?(matt.woodrow)
Created attachment 8642727 [details] [diff] [review]
part 5, factor out status/lockfile

This lets classes implementing DriverCrashGuard choose the lockfile name and status pref names, so they don't collide with other classes.
Attachment #8642727 - Flags: review?(matt.woodrow)
Adding the DXVA/WebGL parts are a little harder since they're in the content process, and we can't access the filesystem to create a lock file there. We'll have to send a sync message.
Attachment #8642708 - Flags: review?(matt.woodrow) → review+
Attachment #8642709 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8642710 [details] [diff] [review]
part 3, make initialization lazy

Review of attachment 8642710 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/src/DriverCrashGuard.h
@@ +28,5 @@
>    Recovered
>  };
>  
> +// DriverCrashGuard is used to detect crashes at graphics driver callsites.
> +// 

Trailing whitespace!
Attachment #8642710 - Flags: review?(matt.woodrow) → review+
Attachment #8642711 - Flags: review?(matt.woodrow) → review+
Attachment #8642727 - Flags: review?(matt.woodrow) → review+
Created attachment 8643573 [details] [diff] [review]
part 5, move telemetry entirely into d3d11 crash guards

Sorry for the review spam, the approach I was going for turned out not to work. Making this all safe for use in ContentProcesses was trickier than I expected.

This patch is just another cleanup, moves all the RecordTelemetry() calls into D3D11LayersCrashGuard.
Attachment #8642727 - Attachment is obsolete: true
Attachment #8643573 - Flags: review?(matt.woodrow)
Created attachment 8643576 [details] [diff] [review]
part 6, simplify guard file naming

This patch simplifies guard file and pref naming by tying it to an enumeration. This enumeration will be used in IPC in the next patch.
Attachment #8643576 - Flags: review?(matt.woodrow)
Created attachment 8643586 [details] [diff] [review]
part 7, e10s support

This is the tricky part. The new model is a DriverCrashGuard can be in three states: a parent guard, a child guard, and a "proxy" in the parent that the child communicates with. (Similar to an IPDL actor, but I decided they were too heavyweight here.)

A parent guard will create a lockfile and set a pref, then delete the file and unset the pref in its destructor. Whenever a parent or proxy guard is created and either of these tombstones exist, the guard assumes a crash occurred.

Child guards will ask the parent process to determine the crash state. This allocates a proxy guard in the ContentParent. Only one proxy guard can be active in a ContentParent at a time. If the ContentParent is destroyed with an active proxy guard, we assume a crash occurred.

As of this patch, a content-side guard will always send at least one sync message to the parent process. The first sync message is to activate the guard, and a second message can be sent to complete the guard. There is no shortcut out if the environment hasn't changed, unlike with the lockfile in the parent process. That's fixable if we think that's a problem.
Attachment #8643586 - Flags: review?(matt.woodrow)
Attachment #8643573 - Flags: review?(matt.woodrow) → review+
Attachment #8643576 - Flags: review?(matt.woodrow) → review+
Created attachment 8643948 [details] [diff] [review]
part 8, dxva2d3d9 crash guard
Attachment #8643948 - Flags: review?(matt.woodrow)
Created attachment 8643959 [details] [diff] [review]
part 9, webgl crash guard

Jeff, the idea behind these patches is: there's an RAII object that guards graphics device initialization. If we crash before completing the guard, we'll recognize the crash next time the same guard is instantiated. We can then bail out early rather than crash continuously.

To avoid activating the test every single time, we only activate it if certain preferences or conditions change. Those are in WebGLCrashGuard::UpdateEnvironment(). I wasn't sure which ones we care about, so if it looks incomplete or overzealous, let me know what to change. (By default, we also check the device and driver version.)

I also wasn't sure where to actually put the guard. I threw it in GLLibraryEGL::EnsureInitialized since that looked like where we initialize ANGLE.
Attachment #8643959 - Flags: review?(jgilbert)
Comment on attachment 8643959 [details] [diff] [review]
part 9, webgl crash guard

Review of attachment 8643959 [details] [diff] [review]:
-----------------------------------------------------------------

I think you want to s/WebGL/GLContext/ in this patch.
Also, don't you want to catch this for GLX, WGL, and CGL?

::: gfx/gl/GLLibraryEGL.cpp
@@ +157,5 @@
>  
> +#ifdef XP_WIN
> +    gfx::WebGLCrashGuard crashGuard;
> +    if (crashGuard.Crashed()) {
> +      return false;

This only covers EGL, not really even WebGL necessarily.

Did you really mean for this to be a crash guard for GLContext? That seems closer to what you're doing here. (And like a good place for it)
Attachment #8643959 - Flags: review?(jgilbert) → review-
(In reply to Jeff Gilbert [:jgilbert] from comment #13)
> Comment on attachment 8643959 [details] [diff] [review]
> part 9, webgl crash guard
> 
> Review of attachment 8643959 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think you want to s/WebGL/GLContext/ in this patch.
> Also, don't you want to catch this for GLX, WGL, and CGL?
> 
> ::: gfx/gl/GLLibraryEGL.cpp
> @@ +157,5 @@
> >  
> > +#ifdef XP_WIN
> > +    gfx::WebGLCrashGuard crashGuard;
> > +    if (crashGuard.Crashed()) {
> > +      return false;
> 
> This only covers EGL, not really even WebGL necessarily.
> 
> Did you really mean for this to be a crash guard for GLContext? That seems
> closer to what you're doing here. (And like a good place for it)

The goal is to cover this crash stack: https://bug1174189.bmoattachments.org/attachment.cgi?id=8621619

So I'm fine with it being EGL (or WGL) specific. But if there's a pinch point where we can stick this to cover all WebGL contexts, that works too.
Flags: needinfo?(jgilbert)
I really think you want GLContext, not just WebGL. This would also cover SkiaGL and OGL layers.

If you really want only WebGL, it's mediated in CreateHeadlessGL() in WebGLContext.cpp.

I advise leaving this for GLContext, as that's more general, and I think the best place for this.
Flags: needinfo?(jgilbert)
Created attachment 8644705 [details] [diff] [review]
part 7, e10s support

This version is a little better, I hope. Now we have three kinds of crash guards:

Parent guard: On new profiles or when the graphics environment changes, parent guards create a zero-byte tombstone file on disk. This gets cleaned up when the guard's destructor runs. If it's still present on startup, we know that something within the guard's scope crashed.

Child guard: Sends a message to the parent process, creating a "proxy" guard in the ContentParent. If the guard was known to crash and the environment has not changed, the proxy returns a "crash" state to the child. Otherwise, the child may proceed. If the ContentParent shuts down while a guard is active, we consider that a crash.

The difference between this and the previous solution is that now we always monitor the child process for crashes within guards, since it is fairly cheap to do so. Guards in the parent are still only activated on environment changes since they incur main-thread disk I/O.

In addition this version prefixes all the cache prefs with the name of the guard.
Attachment #8643586 - Attachment is obsolete: true
Attachment #8643586 - Flags: review?(matt.woodrow)
Attachment #8644705 - Flags: review?(matt.woodrow)
Created attachment 8644708 [details] [diff] [review]
part 9, guard GLContext creation
Attachment #8643959 - Attachment is obsolete: true
Attachment #8644708 - Flags: review?(jgilbert)
Attachment #8643948 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8644705 [details] [diff] [review]
part 7, e10s support

Review of attachment 8644705 [details] [diff] [review]:
-----------------------------------------------------------------

Wouldn't hurt to add the MOZ_ASSERT(XRE_IsParentProcess()) calls to all functions that are commented as being parent process only to avoid needing to refer to the header.

::: gfx/src/DriverCrashGuard.cpp
@@ +300,5 @@
>  
> +std::string
> +DriverCrashGuard::GetFullPrefName(const char* aPref)
> +{
> +  return std::string("gfx.crash-guard.") + 

Trailing whitespace!
Attachment #8644705 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8644708 [details] [diff] [review]
part 9, guard GLContext creation

Review of attachment 8644708 [details] [diff] [review]:
-----------------------------------------------------------------

r=me for the DriverCrashGuard/ContentParent changes.

::: gfx/src/DriverCrashGuard.cpp
@@ +454,5 @@
> +bool
> +GLContextCrashGuard::UpdateEnvironment()
> +{
> +  static bool checked = false;
> +  static bool changed = false;

We can only use statics here because all the prefs we check aren't live, right?

Might be worth documenting that, I doubt it's worth trying to enforce it.
Attachment #8644708 - Flags: review+
Keywords: leave-open
Created attachment 8647732 [details] [diff] [review]
fix bogus assert
Attachment #8647732 - Flags: review?(matt.woodrow)
Attachment #8647732 - Attachment description: fix bougs assert → fix bogus assert
Attachment #8647732 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8644708 [details] [diff] [review]
part 9, guard GLContext creation

Review of attachment 8644708 [details] [diff] [review]:
-----------------------------------------------------------------

Really sorry about overlooking this!

::: gfx/src/DriverCrashGuard.cpp
@@ +454,5 @@
> +bool
> +GLContextCrashGuard::UpdateEnvironment()
> +{
> +  static bool checked = false;
> +  static bool changed = false;

Also statics are not threadsafe, so you should probably grab a (static) mutex for this.
Attachment #8644708 - Flags: review?(jgilbert) → review+
Crash was, mGfxInfo didn't get initialized on all paths. Fixed + relanded.
Flags: needinfo?(dvander)
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.