Closed Bug 1143653 Opened 10 years ago Closed 10 years ago

Crash in mozilla::layers::DataTextureSourceD3D9::UpdateFromTexture

Categories

(Core :: Graphics: Layers, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- fixed
firefox39 --- fixed

People

(Reporter: nical, Assigned: nical)

References

Details

(Keywords: crash, topcrash-win)

Crash Data

Attachments

(1 file, 1 obsolete file)

It looks like the DeviceManagerD3D9 failed to initialize after a device reset. In some of the crashes we can see from the gfxCriticalError logs that it is the case, but surprisingly there are many cases where nothing shows in the gfxCriticalError log even though the device initialization has most of its error cases covered by the log. Digging...
Chances are that if we failed to re-create a D3D9 device, we are in a situation that we don't have the possibility to recover from (it would require changing the CompositorBackend at runtime). The logging here will help figuring this out, and in the case where we fail to re-create a D3D9 device, we'll most probably crash somewhere else. There is at least one other open bug where we crash because of failure to re-create a D3D9 device but I don't have the number handy.
Attachment #8578028 - Flags: review?(jmuizelaar)
Comment on attachment 8578028 [details] [diff] [review] Add some gfxCriticalErrorLog and null-check in UpdateFromTexture. Review of attachment 8578028 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/d3d9/TextureD3D9.cpp @@ +825,5 @@ > } > > DeviceManagerD3D9* dm = gfxWindowsPlatform::GetPlatform()->GetD3D9DeviceManager(); > + if (!dm || !dm->device()) { > + return false; Are you sure returning false is better than crashing? Can you describe what the user will see when we return false and how we'll recover?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #3) > Are you sure returning false is better than crashing? Can you describe what > the user will see when we return false and how we'll recover? I am not sure which is better. If we are in a situation where we won't be able to create a device again, we might as well just crash, but I am not certain that this is the case. In the compositorD3D9 code, when we don't have a device to draw, we cancel the composition and schedule another one rather than crashing. following this logic I assume that we should not crash as soon as we fail to get a device in TextureD3D9. Perhaps a good compromise would be to count the number of times we tried to reinitialize the device at the beginning of a frame and crash after a certain threshold, maybe 3 attempts or something like this?
(In reply to Nicolas Silva [:nical] from comment #4) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #3) > > Are you sure returning false is better than crashing? Can you describe what > > the user will see when we return false and how we'll recover? > > I am not sure which is better. If we are in a situation where we won't be > able to create a device again, we might as well just crash, but I am not > certain that this is the case. > In the compositorD3D9 code, when we don't have a device to draw, we cancel > the composition and schedule another one rather than crashing. following > this logic I assume that we should not crash as soon as we fail to get a > device in TextureD3D9. Perhaps a good compromise would be to count the > number of times we tried to reinitialize the device at the beginning of a > frame and crash after a certain threshold, maybe 3 attempts or something > like this? I'd rather we crash after we fail to reinitialize.
Attachment #8578588 - Flags: review?(jmuizelaar) → review+
The patch that just landed will most likely move this crash and Bug 1074382 (which is caused by the same issue) to a new stack trace which makes a bit more sense, give us some more useful gfxCriticalError logging in crash-stats, and if we are lucky reduce the crash volume a bit. I'd like to let it bake on central and maybe aurora before thinking about uplifting to beta, to be sure that we don't run into a bad situation in cases like recovering from hibernation or whatnot. I think that if we do run into issues there we already have those issues currently, but since this patch isn't likely to reduce the crash volume a lot, it's probably not worth the risk. It may not be worth uploading to beta at all if this bug is caused by the D3D9 device always failing reinitialize. If so, our only chances to fix this are to either find a way for the device de reinitialize properly for sure (not certain if this is possible), or support falling back to the basic compositor after the D3D9 compositor was initialized (definitely not going to happen in time for this beta).
Maybe FYI ... Comment from a reporter: "This only happens when I try to open PDFs within the browser " https://crash-stats.mozilla.com/report/index/c2cd5082-e6da-43da-8a50-433292150312 Adapter Vendor ID: 0x10de Adapter Device ID: 0x0221 https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayers%3A%3ADataTextureSourceD3D9%3A%3AUpdateFromTexture%28IDirect3DTexture9*%2C+nsIntRegion+const*%29&range_value=28&range_unit=days&date=2015-03-14#tab-comments Graphics Adapter Report: The number of crashes for a particular signature by graphics vendor and chipset. Vendor Adapter Report Count Percentage 0x10de 0x0221 335 36.935 % Seems the vendor with the most crashes it "0x10de". Nvidia ??? (Btw.: Is there a place [bug] to improve the vendor/adapter list in crash-stats.mozilla.com?)
Crash Signature: [@ mozilla::layers::DataTextureSourceD3D9::UpdateFromTexture(IDirect3DTexture9*, nsIntRegion const*) ]
Keywords: crash, topcrash-win
re-landed without the assertion that blew up in Windows xp mochitests. https://hg.mozilla.org/integration/mozilla-inbound/rev/150ea8b2b839
Shouldn't the variable be called mFailedResetAttempts rather than mFailedResetAttemps?
(In reply to Robert Longson from comment #12) > Shouldn't the variable be called mFailedResetAttempts rather than > mFailedResetAttemps? Yes indeed, I'll land a typo fix.
Comment on attachment 8578588 [details] [diff] [review] Crash in the compositor rather than in TextureD3D9 if we fail to re-create the device, after a few attemps. Approval Request Comment [Feature/regressing bug #]: [User impact if declined]: May lower the crash volume a little bit, but this patch will also help us gather more useful crash stats. [Describe test coverage new/current, TreeHerder]: [Risks and why]: rather small. A bit late for beta but not too risky for aurora. [String/UUID change made/needed]:
Attachment #8578588 - Flags: approval-mozilla-aurora?
Attachment #8578588 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #19) > So the new nightly topcrash in > https://crash-stats.mozilla.com/report/ > list?signature=mozilla%3A%3Alayers%3A%3ACompositorD3D9%3A%3AFailedToResetDevi > ce%28%29 is caused by this patch shifting signatures? Yes, This patch should have moved crashes from this bug and bug 1074382 to this new signature. Do the crash volumes correspond?
Actually, I do not see any signatures containing "texture" in significant numbers on Nightly in the 10 days before 2015-03-19, but I see mozilla::layers::CompositorD3D9::FailedToResetDevice() with >100 crashes in each of the last 3 days.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #21) > Actually, I do not see any signatures containing "texture" in significant > numbers on Nightly in the 10 days before 2015-03-19, but I see > mozilla::layers::CompositorD3D9::FailedToResetDevice() with >100 crashes in > each of the last 3 days. Ok, means there's something going on that I don't quite understand yet but that I really need to. Backed out the aurora uplift: https://hg.mozilla.org/releases/mozilla-aurora/rev/6b874e2d4d18 For nightly, I will try a few things out first thing tomorrow (I suppose we can afford the crash spike for another 24 hours on nightly). The fix might end up just raising the threshold of failed reset attempts before crashing, but I want to understand how we can currently get into a situation where we fail to reset the D3D9 device 10 times in a row.
(In reply to Nicolas Silva [:nical] from comment #22) > For nightly, I will try a few things out first thing tomorrow (I suppose we > can afford the crash spike for another 24 hours on nightly). Yes, that sounds OK. It sounds a bit like it would be good to add some telemetry to find out how often we fail in a row and whatever else makes sense to gather data for a more informed patch.
Partial backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/4460210c41b5 This relaxes the check and we'll only crash if we failed to create the device manager 10 times in a row, rather than if we didn't get a swapchain. I am assuming that we can repetitively fail to get a swapchain when recovering from sleep and perhaps other cases like that. A few comments in the crash report seem to confirm that.
Given that conversation on this signature is here and you did that partial backout and stuff, I'm connecting that new signature with this bug so people wondering what it is find the conversation.
Crash Signature: [@ mozilla::layers::DataTextureSourceD3D9::UpdateFromTexture(IDirect3DTexture9*, nsIntRegion const*) ] → [@ mozilla::layers::DataTextureSourceD3D9::UpdateFromTexture(IDirect3DTexture9*, nsIntRegion const*) ] [@ mozilla::layers::CompositorD3D9::FailedToResetDevice()]
No crash with mozilla::layers::CompositorD3D9::FailedToResetDevice in the signature since the last patch landed, closing.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Does something still need to land on Fx38 here?
Flags: needinfo?(nical.bugzilla)
Whiteboard: [leave-open]
Target Milestone: --- → mozilla39
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #31) > Does something still need to land on Fx38 here? Nothing according to crash-stats. we can work something out if the signature ever comes back for whatever reason.
Flags: needinfo?(nical.bugzilla)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: