Closed
Bug 1254400
Opened 8 years ago
Closed 8 years ago
crash in mozilla::layers::CompositorD3D9::FailedToResetDevice
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: milan, Assigned: mtseng)
References
Details
(Keywords: crash, Whiteboard: [gfx-noted])
Crash Data
Attachments
(3 files, 2 obsolete files)
5.64 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
6.93 KB,
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
6.97 KB,
patch
|
mtseng
:
review+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-93a03670-63ef-44aa-bb6f-ed82b2160303. ============================================================= See also bug 1245781.
Reporter | ||
Comment 1•8 years ago
|
||
There are too many of these reports to ignore, despite bug 1245781 being closed. Peter, is there somebody that could take a look? David is a good person to talk to about different device reset strategies, or if we can still end up in the "compositors might be mixed" situation.
Whiteboard: [gfx-noted]
I'll take a look and see if there's anything obvious we can do short-term. D3D9 may be a candidate for the crash guard.
Assignee: nobody → dvander
Status: NEW → ASSIGNED
Comment 3•8 years ago
|
||
I think this is us not handling the device going away and falling back to software.
Reporter | ||
Comment 4•8 years ago
|
||
Yes, we have a "Compositors might be mixed" message. David, I have Peter CC-d on this bug, if you can guide somebody from his team through this, we can maybe get more people involved.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #3) > I think this is us not handling the device going away and falling back to > software. Yeah, that looks right. Instead of this MOZ_CRASH: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/d3d9/CompositorD3D9.cpp#678 We should post a message back to the main thread to run nsBaseWidget::OnRenderingDeviceReset: https://dxr.mozilla.org/mozilla-central/source/widget/nsBaseWidget.cpp?from=nsBaseWidget.cpp#296 The compositor will probably render nothing for a few frames but when the task round trips back to the compositor it should fall back to software.
Assignee: dvander → nobody
Status: ASSIGNED → NEW
Reporter | ||
Comment 6•8 years ago
|
||
Thanks - Peter, let me know if you can run with this info.
Assignee | ||
Comment 8•8 years ago
|
||
I'm trying post a message to main thread for running OnRenderingDeviceReset. But it results a crash. I'll keep look into it.
Assignee | ||
Comment 9•8 years ago
|
||
Attachment #8732714 -
Flags: review?(matt.woodrow)
Updated•8 years ago
|
Attachment #8732714 -
Flags: review?(matt.woodrow) → review?(dvander)
Reporter | ||
Comment 11•8 years ago
|
||
Comment on attachment 8732714 [details] [diff] [review] Handle device reset for DirectX 9. Review of attachment 8732714 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxWindowsPlatform.h @@ +328,5 @@ > bool mIsWARP; > bool mHasDeviceReset; > bool mHasFakeDeviceReset; > bool mCompositorD3D11TextureSharingWorks; > + mozilla::Atomic<bool> mHasD3D9DeviceReset; We can't use the mHasDeviceReset for this?
(In reply to Morris Tseng [:mtseng] from comment #10) > Hi David, review ping for comment 9. Sorry for the delay - I'll test this today.
Flags: needinfo?(dvander)
Assignee | ||
Comment 13•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #11) > Comment on attachment 8732714 [details] [diff] [review] > Handle device reset for DirectX 9. > > Review of attachment 8732714 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/thebes/gfxWindowsPlatform.h > @@ +328,5 @@ > > bool mIsWARP; > > bool mHasDeviceReset; > > bool mHasFakeDeviceReset; > > bool mCompositorD3D11TextureSharingWorks; > > + mozilla::Atomic<bool> mHasD3D9DeviceReset; > > We can't use the mHasDeviceReset for this? mHasDeviceReset involved d3d11 logic. I think if we use mHasDeviceReset, it might cause some side effect for d3d11. Therefore, I create an another variable for d3d9 only.
Comment on attachment 8732714 [details] [diff] [review] Handle device reset for DirectX 9. This patch will fix the crash, but if we fail to acquire a new device, the browser will stop compositing and will appear frozen (or entirely black, on Window XP). You can see that by injecting a late permanent failure into EnsureSwapChain and GetD3D9DeviceManager. We need to tell the main thread to recreate the compositor, which is what OnRenderingDeviceReset does. This would also let you remove mHasD3D9DeviceReset. Morris, you mentioned you tried that approach in comment #8 and it crashed. Do you remember what the crash was? One other note is that mDeviceManager is not atomic, which is a pre-existing bug. We should be updating and acquiring references to it within a lock. (This used to affect D3D11 too, see bug 1258174)
Attachment #8732714 -
Flags: review?(dvander) → review-
Assignee | ||
Comment 15•8 years ago
|
||
(In reply to David Anderson [:dvander] from comment #14) > Comment on attachment 8732714 [details] [diff] [review] > Handle device reset for DirectX 9. > > This patch will fix the crash, but if we fail to acquire a new device, the > browser will stop compositing and will appear frozen (or entirely black, on > Window XP). You can see that by injecting a late permanent failure into > EnsureSwapChain and GetD3D9DeviceManager. > > We need to tell the main thread to recreate the compositor, which is what > OnRenderingDeviceReset does. This would also let you remove > mHasD3D9DeviceReset. https://dxr.mozilla.org/mozilla-central/source/widget/windows/nsWindowGfx.cpp#189 Here will detect device reset and call OnRenderingDeviceReset(). That's why I add mHasD3D9DeviceReset. Do you think I should call OnRenderingDeviceReset directly at FailedToResetDevice? I think the mechanism in nsWindowGfx work quiet well and we should follow it. On my local test, I injected a permanent failure into CompositorD3D9::Ready and it will recreate a basic compositor and the compositing result is correct. I'll do more tests in EnsureSwapChain and GetD3D9DeviceManager. > > Morris, you mentioned you tried that approach in comment #8 and it crashed. > Do you remember what the crash was? Yes, that's because the first d3d9 swap chain hold window handle when we fallback to basic compositor. The swap chain didn't release. So I set mSwapChain and mDeviceManager to null when we detect device reset. > > One other note is that mDeviceManager is not atomic, which is a pre-existing > bug. We should be updating and acquiring references to it within a lock. > (This used to affect D3D11 too, see bug 1258174) Ok, I'll try to address this in my next patch. Thanks.
Flags: needinfo?(dvander)
(In reply to Morris Tseng [:mtseng] from comment #15) > Here will detect device reset and call OnRenderingDeviceReset(). That's why > I add mHasD3D9DeviceReset. Do you think I should call OnRenderingDeviceReset > directly at FailedToResetDevice? I think the mechanism in nsWindowGfx work > quiet well and we should follow it. > > On my local test, I injected a permanent failure into CompositorD3D9::Ready > and it will recreate a basic compositor and the compositing result is > correct. I'll do more tests in EnsureSwapChain and GetD3D9DeviceManager. Interesting, I didn't see the compositor get recreated, but it you're right - it should. It might be that I didn't receive a WM_PAINT. I'll check again.
Flags: needinfo?(dvander)
Sorry for the confusion - I think I see what I did wrong now. When I injected a failure into VerifyDeviceForRendering, I didn't call DestroyDevice. That function (racily) revokes the device from gfxWindowsPlatform, and FailedToResetDevice is only called if there's no device. Now when I test your patch, we do recover from the device reset. It seems delayed though - Firefox turns black until the window gains focus or something triggers a paint. That's probably okay, so just fixing the mD3D9DeviceManager race is good enough. We also shouldn't need to null out the pointer in gfxWindowsPlatform::D3D9DeviceReset since it will already be null. (Not a big deal to do it, but it's a little confusing who is responsible for clearing it, so either way maybe put a comment in that function.)
Assignee | ||
Comment 18•8 years ago
|
||
Addressed dvander's comment. Major changes: 1. Remove null out the mDeviceManager in D3D9DeviceReset. 2. Let TextureSourceD3D9 hold refptr of DeviceManagerD3D9 instead of rawptr. Because gfxWindowsPlatform might destroy device manager while device reset and the pointer of device manager in TextureSourceD3D9 would become dangling pointer. 3. Remove gfxCriticalError in TextureHostD3D9::UpdatedInternal because when device reset this operation might fail. We shouldn't crash program here because we handle the device reset.
Attachment #8732714 -
Attachment is obsolete: true
Attachment #8734666 -
Flags: review?(dvander)
Assignee | ||
Comment 19•8 years ago
|
||
Attachment #8734668 -
Flags: review?(dvander)
Reporter | ||
Comment 20•8 years ago
|
||
Comment on attachment 8734666 [details] [diff] [review] Part 1: Handle device reset for DirectX 9 v2. Review of attachment 8734666 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d9/CompositorD3D9.cpp @@ +670,5 @@ > // 10 is a totally arbitrary number that we may want to increase or decrease > // depending on how things behave in the wild. > if (mFailedResetAttempts > 10) { > + mFailedResetAttempts = 0; > + gfxWindowsPlatform::GetPlatform()->D3D9DeviceReset(); Maybe add more information when this happens - gfxCriticalNote, or something like that, so that we can see if it happens in crashes. ::: gfx/layers/d3d9/TextureD3D9.cpp @@ +892,5 @@ > regionToUpdate = nullptr; > } > } > > + mTextureSource->UpdateFromTexture(mTexture, regionToUpdate); Do we not care if this fails? We should at least have a gfxWarning, or gfxCriticalNote if we do not want to assert.
Assignee | ||
Comment 21•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #20) > Comment on attachment 8734666 [details] [diff] [review] > Part 1: Handle device reset for DirectX 9 v2. > > Review of attachment 8734666 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: gfx/layers/d3d9/CompositorD3D9.cpp > @@ +670,5 @@ > > // 10 is a totally arbitrary number that we may want to increase or decrease > > // depending on how things behave in the wild. > > if (mFailedResetAttempts > 10) { > > + mFailedResetAttempts = 0; > > + gfxWindowsPlatform::GetPlatform()->D3D9DeviceReset(); > > Maybe add more information when this happens - gfxCriticalNote, or something > like that, so that we can see if it happens in crashes. I'll add a gfxWarning here. > > ::: gfx/layers/d3d9/TextureD3D9.cpp > @@ +892,5 @@ > > regionToUpdate = nullptr; > > } > > } > > > > + mTextureSource->UpdateFromTexture(mTexture, regionToUpdate); > > Do we not care if this fails? We should at least have a gfxWarning, or > gfxCriticalNote if we do not want to assert. I'll add gfxWarning as well in next patch.
Assignee | ||
Comment 22•8 years ago
|
||
Addressed Milan's comment.
Attachment #8734666 -
Attachment is obsolete: true
Attachment #8734666 -
Flags: review?(dvander)
Attachment #8735357 -
Flags: review?(dvander)
Attachment #8735357 -
Flags: review?(dvander) → review+
Comment on attachment 8734668 [details] [diff] [review] Part 2: Make access to d3d9 device manager thread-safe. Review of attachment 8734668 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for addressing this! ::: gfx/thebes/gfxWindowsPlatform.cpp @@ +1398,5 @@ > { > // We should only create the d3d9 device on the compositor thread > // or we don't have a compositor thread. > + RefPtr<DeviceManagerD3D9> result; > + MutexAutoLock lock(mDeviceLock); I don't think it matters much, but I like to do the initialization of mDeviceManager outside the lock, and only take the lock to do the final assignment or reference transfer.
Attachment #8734668 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 24•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2b380303189
Assignee | ||
Comment 25•8 years ago
|
||
Try is good!
Comment 26•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f75d2232611e https://hg.mozilla.org/integration/mozilla-inbound/rev/2626d857b985
Comment 27•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f75d2232611e https://hg.mozilla.org/mozilla-central/rev/2626d857b985
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Reporter | ||
Comment 28•8 years ago
|
||
(In reply to Morris Tseng [:mtseng] from comment #21) > > > > Do we not care if this fails? We should at least have a gfxWarning, or > > gfxCriticalNote if we do not want to assert. > I'll add gfxWarning as well in next patch. gfxWarning doesn't show up in crash reports or non-debug builds (by default.) gfxCriticalNote, on the other hand, shows up in crash reports and non-debug builds' about:support section, which is useful.
Assignee | ||
Comment 29•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #28) > (In reply to Morris Tseng [:mtseng] from comment #21) > > > > > > Do we not care if this fails? We should at least have a gfxWarning, or > > > gfxCriticalNote if we do not want to assert. > > I'll add gfxWarning as well in next patch. > > gfxWarning doesn't show up in crash reports or non-debug builds (by > default.) gfxCriticalNote, on the other hand, shows up in crash reports and > non-debug builds' about:support section, which is useful. But gfxCriticalNote will crash debug build. Do we have another way which doesn't crash debug build and show useful information in about:support section?
Reporter | ||
Comment 30•8 years ago
|
||
gfxCriticalNote would not crash the debug build, so you're OK to use it. gfxCriticalError does.
Assignee | ||
Comment 31•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #30) > gfxCriticalNote would not crash the debug build, so you're OK to use it. > gfxCriticalError does. Thanks for the info. Create a follow-up bug 1262361.
Comment 32•8 years ago
|
||
[Tracking Requested - why for this release]: I'm not sure if this was missed but this shows up for Beta and Release, not just Nightly, but strangely not Aurora. 7-day Ranks: #33 in Firefox 45.0.2 with 2084 crashes. #25 in Firefox 46.0b* with 732 crashes. Is it too late to uplift?
status-firefox45:
--- → affected
status-firefox46:
--- → affected
status-firefox47:
--- → ?
tracking-firefox45:
--- → ?
Flags: needinfo?(milan)
Comment 33•8 years ago
|
||
[Tracking Requested - why for this release]: [Tracking Requested - why for this release]: [Tracking Requested - why for this release]: 45 is EOL in a few days, tracking request moved to 46 & 47
tracking-firefox46:
--- → ?
tracking-firefox47:
--- → ?
Comment 34•8 years ago
|
||
Too late to make it into 46 RC2. But I will track this and keep it open for 46 in case we do a dot release and want to take it as a ridealong.
tracking-firefox48:
--- → +
Comment 35•8 years ago
|
||
this is currently the #25 crash on 47.0b2 with 0.3% of crashes.
Hello Morris, David, given that this is a top crasher in Fx47, should we consider uplifting to Beta? I would like to get your assessment on the risk involved. Thanks!
Flags: needinfo?(mtseng)
Flags: needinfo?(dvander)
Reporter | ||
Comment 37•8 years ago
|
||
I'd personally be nervous about uplifting this, but I'll let the author and reviewer pipe in.
Flags: needinfo?(milan)
(In reply to Milan Sreckovic [:milan] from comment #37) > I'd personally be nervous about uplifting this, but I'll let the author and > reviewer pipe in. We've uplifted much scarier reset-handling code (for D3D11, mainly). This doesn't seem to have caused any problems so far, and AFAICT (I've read crash-stats wrong before) there don't seem to be any crashes in Firefox 48/49. So I'd be okay with uplifting this.
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #38) That is, this signature seems to have disappeared in 48/49.
Assignee | ||
Comment 40•8 years ago
|
||
David, since bug 1258174 didn't uplift to beta. We may only uplift part 1 of this bug to beta. Do you think this is acceptable?
Flags: needinfo?(mtseng) → needinfo?(dvander)
(In reply to Morris Tseng [:mtseng] from comment #40) > David, since bug 1258174 didn't uplift to beta. We may only uplift part 1 of > this bug to beta. Do you think this is acceptable? Yeah.
Flags: needinfo?(dvander)
Assignee | ||
Comment 43•8 years ago
|
||
Comment on attachment 8749537 [details] [diff] [review] Handle device reset for d3d9. Approval Request Comment [Feature/regressing bug #]: 1254400 [User impact if declined]: Crashes when gpu driver reset on DirectX 9. [Describe test coverage new/current, TreeHerder]: Tested on nightly and it seems no more crash happened on crash stats. [Risks and why]: Should be low risk since the crash stats on nightly and aurora looks good. [String/UUID change made/needed]: n/a
Attachment #8749537 -
Flags: approval-mozilla-beta?
Comment on attachment 8749537 [details] [diff] [review] Handle device reset for d3d9. This is ranked #31 on 47.0b, let's hope this fix helps, Beta47+
Attachment #8749537 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 45•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/25a3ad268da5
Reporter | ||
Comment 46•8 years ago
|
||
No reports from 47.0b4 yet, looks like it worked.
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•