Closed
Bug 1190281
Opened 9 years ago
Closed 9 years ago
Allow driver crash detection at media/webgl callsites
Categories
(Core :: Graphics, defect)
Core
Graphics
Tracking
()
RESOLVED
FIXED
People
(Reporter: dvander, Assigned: dvander)
References
Details
Attachments
(10 files, 3 obsolete files)
4.63 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
10.53 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
5.68 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
9.20 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
5.08 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
6.35 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
3.35 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
25.23 KB,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
6.85 KB,
patch
|
jgilbert
:
review+
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
861 bytes,
patch
|
mattwoodrow
:
review+
|
Details | Diff | Splinter Review |
The whole DriverInitCrashDetection thing is kinda hardcoded for layers acceleration. This bug is for pulling it apart and using it for media and WebGL device creation.
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 1•9 years ago
|
||
Splitting this into reviewable chunks. Goal is to move d3d11/layers stuff out of the crash detection logic and into its own class.
First up, this static var just isn't needed, it's covered by the status pref.
Attachment #8642708 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 2•9 years ago
|
||
Attachment #8642709 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 3•9 years ago
|
||
This is because I want to derive DriverCrashGuard into subclasses, but then we can't call virtual functions from the constructor. Easier to just split the constructor out.
Attachment #8642710 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 4•9 years ago
|
||
This introduces a D3D11LayersCrashGuard class and moves a bunch of the environment logic into that.
Attachment #8642711 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 5•9 years ago
|
||
This lets classes implementing DriverCrashGuard choose the lockfile name and status pref names, so they don't collide with other classes.
Attachment #8642727 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 6•9 years ago
|
||
Adding the DXVA/WebGL parts are a little harder since they're in the content process, and we can't access the filesystem to create a lock file there. We'll have to send a sync message.
Updated•9 years ago
|
Attachment #8642708 -
Flags: review?(matt.woodrow) → review+
Updated•9 years ago
|
Attachment #8642709 -
Flags: review?(matt.woodrow) → review+
Comment 7•9 years ago
|
||
Comment on attachment 8642710 [details] [diff] [review]
part 3, make initialization lazy
Review of attachment 8642710 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/src/DriverCrashGuard.h
@@ +28,5 @@
> Recovered
> };
>
> +// DriverCrashGuard is used to detect crashes at graphics driver callsites.
> +//
Trailing whitespace!
Attachment #8642710 -
Flags: review?(matt.woodrow) → review+
Updated•9 years ago
|
Attachment #8642711 -
Flags: review?(matt.woodrow) → review+
Updated•9 years ago
|
Attachment #8642727 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Comment 8•9 years ago
|
||
Sorry for the review spam, the approach I was going for turned out not to work. Making this all safe for use in ContentProcesses was trickier than I expected.
This patch is just another cleanup, moves all the RecordTelemetry() calls into D3D11LayersCrashGuard.
Attachment #8642727 -
Attachment is obsolete: true
Attachment #8643573 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 9•9 years ago
|
||
This patch simplifies guard file and pref naming by tying it to an enumeration. This enumeration will be used in IPC in the next patch.
Attachment #8643576 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 10•9 years ago
|
||
This is the tricky part. The new model is a DriverCrashGuard can be in three states: a parent guard, a child guard, and a "proxy" in the parent that the child communicates with. (Similar to an IPDL actor, but I decided they were too heavyweight here.)
A parent guard will create a lockfile and set a pref, then delete the file and unset the pref in its destructor. Whenever a parent or proxy guard is created and either of these tombstones exist, the guard assumes a crash occurred.
Child guards will ask the parent process to determine the crash state. This allocates a proxy guard in the ContentParent. Only one proxy guard can be active in a ContentParent at a time. If the ContentParent is destroyed with an active proxy guard, we assume a crash occurred.
As of this patch, a content-side guard will always send at least one sync message to the parent process. The first sync message is to activate the guard, and a second message can be sent to complete the guard. There is no shortcut out if the environment hasn't changed, unlike with the lockfile in the parent process. That's fixable if we think that's a problem.
Attachment #8643586 -
Flags: review?(matt.woodrow)
Updated•9 years ago
|
Attachment #8643573 -
Flags: review?(matt.woodrow) → review+
Updated•9 years ago
|
Attachment #8643576 -
Flags: review?(matt.woodrow) → review+
Assignee | ||
Comment 11•9 years ago
|
||
Attachment #8643948 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 12•9 years ago
|
||
Jeff, the idea behind these patches is: there's an RAII object that guards graphics device initialization. If we crash before completing the guard, we'll recognize the crash next time the same guard is instantiated. We can then bail out early rather than crash continuously.
To avoid activating the test every single time, we only activate it if certain preferences or conditions change. Those are in WebGLCrashGuard::UpdateEnvironment(). I wasn't sure which ones we care about, so if it looks incomplete or overzealous, let me know what to change. (By default, we also check the device and driver version.)
I also wasn't sure where to actually put the guard. I threw it in GLLibraryEGL::EnsureInitialized since that looked like where we initialize ANGLE.
Attachment #8643959 -
Flags: review?(jgilbert)
Comment 13•9 years ago
|
||
Comment on attachment 8643959 [details] [diff] [review]
part 9, webgl crash guard
Review of attachment 8643959 [details] [diff] [review]:
-----------------------------------------------------------------
I think you want to s/WebGL/GLContext/ in this patch.
Also, don't you want to catch this for GLX, WGL, and CGL?
::: gfx/gl/GLLibraryEGL.cpp
@@ +157,5 @@
>
> +#ifdef XP_WIN
> + gfx::WebGLCrashGuard crashGuard;
> + if (crashGuard.Crashed()) {
> + return false;
This only covers EGL, not really even WebGL necessarily.
Did you really mean for this to be a crash guard for GLContext? That seems closer to what you're doing here. (And like a good place for it)
Attachment #8643959 -
Flags: review?(jgilbert) → review-
Assignee | ||
Comment 14•9 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #13)
> Comment on attachment 8643959 [details] [diff] [review]
> part 9, webgl crash guard
>
> Review of attachment 8643959 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I think you want to s/WebGL/GLContext/ in this patch.
> Also, don't you want to catch this for GLX, WGL, and CGL?
>
> ::: gfx/gl/GLLibraryEGL.cpp
> @@ +157,5 @@
> >
> > +#ifdef XP_WIN
> > + gfx::WebGLCrashGuard crashGuard;
> > + if (crashGuard.Crashed()) {
> > + return false;
>
> This only covers EGL, not really even WebGL necessarily.
>
> Did you really mean for this to be a crash guard for GLContext? That seems
> closer to what you're doing here. (And like a good place for it)
The goal is to cover this crash stack: https://bug1174189.bmoattachments.org/attachment.cgi?id=8621619
So I'm fine with it being EGL (or WGL) specific. But if there's a pinch point where we can stick this to cover all WebGL contexts, that works too.
Flags: needinfo?(jgilbert)
Comment 15•9 years ago
|
||
I really think you want GLContext, not just WebGL. This would also cover SkiaGL and OGL layers.
If you really want only WebGL, it's mediated in CreateHeadlessGL() in WebGLContext.cpp.
I advise leaving this for GLContext, as that's more general, and I think the best place for this.
Flags: needinfo?(jgilbert)
Assignee | ||
Comment 16•9 years ago
|
||
This version is a little better, I hope. Now we have three kinds of crash guards:
Parent guard: On new profiles or when the graphics environment changes, parent guards create a zero-byte tombstone file on disk. This gets cleaned up when the guard's destructor runs. If it's still present on startup, we know that something within the guard's scope crashed.
Child guard: Sends a message to the parent process, creating a "proxy" guard in the ContentParent. If the guard was known to crash and the environment has not changed, the proxy returns a "crash" state to the child. Otherwise, the child may proceed. If the ContentParent shuts down while a guard is active, we consider that a crash.
The difference between this and the previous solution is that now we always monitor the child process for crashes within guards, since it is fairly cheap to do so. Guards in the parent are still only activated on environment changes since they incur main-thread disk I/O.
In addition this version prefixes all the cache prefs with the name of the guard.
Attachment #8643586 -
Attachment is obsolete: true
Attachment #8643586 -
Flags: review?(matt.woodrow)
Attachment #8644705 -
Flags: review?(matt.woodrow)
Assignee | ||
Comment 17•9 years ago
|
||
Attachment #8643959 -
Attachment is obsolete: true
Attachment #8644708 -
Flags: review?(jgilbert)
Updated•9 years ago
|
Attachment #8643948 -
Flags: review?(matt.woodrow) → review+
Comment 18•9 years ago
|
||
Comment on attachment 8644705 [details] [diff] [review]
part 7, e10s support
Review of attachment 8644705 [details] [diff] [review]:
-----------------------------------------------------------------
Wouldn't hurt to add the MOZ_ASSERT(XRE_IsParentProcess()) calls to all functions that are commented as being parent process only to avoid needing to refer to the header.
::: gfx/src/DriverCrashGuard.cpp
@@ +300,5 @@
>
> +std::string
> +DriverCrashGuard::GetFullPrefName(const char* aPref)
> +{
> + return std::string("gfx.crash-guard.") +
Trailing whitespace!
Attachment #8644705 -
Flags: review?(matt.woodrow) → review+
Comment 19•9 years ago
|
||
Comment on attachment 8644708 [details] [diff] [review]
part 9, guard GLContext creation
Review of attachment 8644708 [details] [diff] [review]:
-----------------------------------------------------------------
r=me for the DriverCrashGuard/ContentParent changes.
::: gfx/src/DriverCrashGuard.cpp
@@ +454,5 @@
> +bool
> +GLContextCrashGuard::UpdateEnvironment()
> +{
> + static bool checked = false;
> + static bool changed = false;
We can only use statics here because all the prefs we check aren't live, right?
Might be worth documenting that, I doubt it's worth trying to enforce it.
Attachment #8644708 -
Flags: review+
Comment 20•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/e98a176b4e0b
https://hg.mozilla.org/integration/mozilla-inbound/rev/cf98685302b6
https://hg.mozilla.org/integration/mozilla-inbound/rev/50edbeb8912e
https://hg.mozilla.org/integration/mozilla-inbound/rev/e17123b5bafd
https://hg.mozilla.org/integration/mozilla-inbound/rev/7f904a453bdb
https://hg.mozilla.org/integration/mozilla-inbound/rev/c85c1d11bd72
https://hg.mozilla.org/integration/mozilla-inbound/rev/6ddbdbd66174
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Comment 21•9 years ago
|
||
Backout for OS X bustage together with bug 1183910.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=d82b5a78686a
Comment 22•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/01047d457f89
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9f1a20953fd
https://hg.mozilla.org/integration/mozilla-inbound/rev/d6c5751892ec
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7df14550b63
https://hg.mozilla.org/integration/mozilla-inbound/rev/0acb079b82c6
https://hg.mozilla.org/integration/mozilla-inbound/rev/12635aa6b1ef
https://hg.mozilla.org/integration/mozilla-inbound/rev/04da73b234d5
Comment 23•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/01047d457f89
https://hg.mozilla.org/mozilla-central/rev/c9f1a20953fd
https://hg.mozilla.org/mozilla-central/rev/d6c5751892ec
https://hg.mozilla.org/mozilla-central/rev/d7df14550b63
https://hg.mozilla.org/mozilla-central/rev/0acb079b82c6
https://hg.mozilla.org/mozilla-central/rev/12635aa6b1ef
https://hg.mozilla.org/mozilla-central/rev/04da73b234d5
Assignee | ||
Comment 24•9 years ago
|
||
Attachment #8647732 -
Flags: review?(matt.woodrow)
Assignee | ||
Updated•9 years ago
|
Attachment #8647732 -
Attachment description: fix bougs assert → fix bogus assert
Updated•9 years ago
|
Attachment #8647732 -
Flags: review?(matt.woodrow) → review+
Comment 25•9 years ago
|
||
Comment on attachment 8644708 [details] [diff] [review]
part 9, guard GLContext creation
Review of attachment 8644708 [details] [diff] [review]:
-----------------------------------------------------------------
Really sorry about overlooking this!
::: gfx/src/DriverCrashGuard.cpp
@@ +454,5 @@
> +bool
> +GLContextCrashGuard::UpdateEnvironment()
> +{
> + static bool checked = false;
> + static bool changed = false;
Also statics are not threadsafe, so you should probably grab a (static) mutex for this.
Attachment #8644708 -
Flags: review?(jgilbert) → review+
Comment 26•9 years ago
|
||
Comment 27•9 years ago
|
||
Comment 28•9 years ago
|
||
Part 9 backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/fe6aedfbb45a for gfx crashes like https://treeherder.mozilla.org/logviewer.html#?job_id=12902021&repo=mozilla-inbound
Flags: needinfo?(dvander)
Comment 30•9 years ago
|
||
Assignee | ||
Comment 31•9 years ago
|
||
Crash was, mGfxInfo didn't get initialized on all paths. Fixed + relanded.
Flags: needinfo?(dvander)
Comment 32•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 33•7 years ago
|
||
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•