Closed Bug 1116812 Opened 9 years ago Closed 9 years ago

CompositorD3D11::HandleError coming from mozilla::layers::CompositorD3D11::UpdateRenderTarget() probably from TDRs

Categories

(Core :: Graphics: Layers, defect)

All
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla40
Tracking Status
firefox36 + wontfix
firefox37 + wontfix
firefox38 + fixed
firefox38.0.5 --- fixed
firefox39 --- fixed
firefox40 --- fixed
firefox-esr38 --- fixed

People

(Reporter: stephend, Assigned: bas.schouten)

References

Details

(Keywords: crash, topcrash-win, Whiteboard: [tbird crash])

Crash Data

Attachments

(4 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-53c615b9-efef-4886-9769-1a2992141230.
=============================================================

STR:

1. Using yesterday's Nightly build, I updated my Nvidia driver to this: http://www.nvidia.com/download/driverResults.aspx/80913/en-us

Version: 	347.09  WHQL
Release Date: 	2014.12.23
Operating System: 	Windows 7 64-bit, Windows 8.1 64-bit, Windows 8 64-bit, Windows Vista 64-bit
Language: 	English (US)

2. While installing the above update, my screen flashed black for a second (while the driver reset/detected the display, I guess)
3. As soon as the screen flashed, Nightly crashed

(Sorry, but I can't repro -- still filing because it'll likely help.)
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) 	gfx/layers/d3d11/CompositorD3D11.cpp
1 	xul.dll 	mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) 	gfx/layers/d3d11/CompositorD3D11.cpp
2 	xul.dll 	mozilla::layers::CompositorD3D11::UpdateRenderTarget() 	gfx/layers/d3d11/CompositorD3D11.cpp
3 	xul.dll 	mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) 	gfx/layers/d3d11/CompositorD3D11.cpp
4 	xul.dll 	mozilla::layers::LayerManagerComposite::Render() 	gfx/layers/composite/LayerManagerComposite.cpp
5 	xul.dll 	mozilla::layers::LayerManagerComposite::EndTransaction(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/composite/LayerManagerComposite.cpp
6 	xul.dll 	mozilla::layers::LayerManagerComposite::EndEmptyTransaction(mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/composite/LayerManagerComposite.cpp
7 	xul.dll 	mozilla::layers::CompositorParent::CompositeToTarget(mozilla::gfx::DrawTarget*, nsIntRect const*) 	gfx/layers/ipc/CompositorParent.cpp
8 	xul.dll 	RunnableMethod<mozilla::layers::CompositorParent, void ( mozilla::layers::CompositorParent::*)(mozilla::TimeStamp), Tuple1<mozilla::TimeStamp> >::Run() 	ipc/chromium/src/base/task.h
9 	xul.dll 	MessageLoop::DoWork() 	ipc/chromium/src/base/message_loop.cc
10 	xul.dll 	base::MessagePumpForUI::DoRunLoop() 	ipc/chromium/src/base/message_pump_win.cc
11 	xul.dll 	base::MessagePumpWin::Run(base::MessagePump::Delegate*) 	ipc/chromium/src/base/message_pump_win.h
12 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc
13 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc
14 	xul.dll 	base::Thread::ThreadMain() 	ipc/chromium/src/base/thread.cc
15 	xul.dll 	`anonymous namespace'::ThreadFunc(void*) 	ipc/chromium/src/base/platform_thread_win.cc
16 	kernel32.dll 	BaseThreadInitThunk 	
17 	ntdll.dll 	RtlUserThreadStart 	
18 	kernel32.dll 	BasepReportFault 	
19 	kernel32.dll 	BasepReportFault
[Tracking Requested - why for this release]:
This is the #4 topcrash on 36.0b1 with 2.8% of all crashes as of now.
Summary: crash in mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::UpdateRenderTarget() → CompositorD3D11::HandleError comping from mozilla::layers::CompositorD3D11::UpdateRenderTarget()
Top crash, tracking! 
Milan, can you help?
Flags: needinfo?(milan)
This is "Invalid D3D API Call" (DXGI_ERROR_INVALID_CALL), and is preceded by a bunch of failures to create "normal" sized bitmaps (e.g., 22x21, 59x22, etc.) with failure code 0x8899000c in DrawTargetD2D1::CreateSourceSurfaceFromData.
Assignee: nobody → bas
Flags: needinfo?(milan) → needinfo?(bas)
(In reply to Milan Sreckovic [:milan] from comment #4)
> This is "Invalid D3D API Call" (DXGI_ERROR_INVALID_CALL), and is preceded by
> a bunch of failures to create "normal" sized bitmaps (e.g., 22x21, 59x22,
> etc.) with failure code 0x8899000c in
> DrawTargetD2D1::CreateSourceSurfaceFromData.

This is another crash that happens on TDR. A situation where we always used to crash, so unlikely to be a regression, but one that we can probably fix though.
Flags: needinfo?(bas)
On the bright side, this really shows how much of a long tail of crashes we've consolidated.
(In reply to Bas Schouten (:bas.schouten) from comment #6)
> On the bright side, this really shows how much of a long tail of crashes
> we've consolidated.

I know! Fixing this would have a major impact now.

We can't tell what's causing driver resets, can we?  I'm thinking of the bug 1124427, where "stress testing webgl" causes driver resets, wonder if a good portion of them (resets) is coming from there...
(In reply to Milan Sreckovic [:milan] from comment #7)
> (In reply to Bas Schouten (:bas.schouten) from comment #6)
> > On the bright side, this really shows how much of a long tail of crashes
> > we've consolidated.
> 
> I know! Fixing this would have a major impact now.
> 
> We can't tell what's causing driver resets, can we?  I'm thinking of the bug
> 1124427, where "stress testing webgl" causes driver resets, wonder if a good
> portion of them (resets) is coming from there...

Not really, I think technically Windows doesn't even really know.
Fwiw, I -suspect- most of the resets are just a GPU driver crashing because of a random driver bug that happens to get hit.
Bas, do you think you will be able to fix that during the 36 cycle? thanks beta5 gtb is tomorrow
Flags: needinfo?(bas)
(In reply to Sylvestre Ledru [:sylvestre] from comment #10)
> Bas, do you think you will be able to fix that during the 36 cycle? thanks
> beta5 gtb is tomorrow

Perhaps, I want to reiterate this is -not- a regression.
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #11)
> Perhaps, I want to reiterate this is -not- a regression.
Sure but it does not really matter since it is a crash.
(In reply to Sylvestre Ledru [:sylvestre] from comment #13)
> (In reply to Bas Schouten (:bas.schouten) from comment #11)
> > Perhaps, I want to reiterate this is -not- a regression.
> Sure but it does not really matter since it is a crash.

Right, the claim is that we have just grouped the existing crashes into a single signature.  Still want to see if we can fix it, just clarifying.
(In reply to Sylvestre Ledru [:sylvestre] from comment #13)
> (In reply to Bas Schouten (:bas.schouten) from comment #11)
> > Perhaps, I want to reiterate this is -not- a regression.
> Sure but it does not really matter since it is a crash.

Yes, let me try and be clearer.. there is noone crashing from this crash, that wasn't crashing already in this situation on 36, or Release.
pragmatic hit this crash on her older Win 7 machine while watching youtube.com in full screen. https://crash-stats.mozilla.com/report/index/8270542a-e9a6-4031-8104-be1cd2150130. She might have a good machine to try to use for reproducing if it is needed.
Bas, any more info we can ask for to help us?  Let's assume we can fix this on our side and see what we can do short term.
Flags: needinfo?(bas)
We may just drop the assert in Beta, and see what happens; worse case scenario is that we crash elsewhere.  Bas will provide more details and the patch.
Milan, any progress on this? Thanks
Flags: needinfo?(milan)
(In reply to Sylvestre Ledru [:sylvestre] from comment #20)
> Milan, any progress on this? Thanks

As per my e-mail before. I don't think there's going to be any point in dropping this assert. I believe it would just spread the crashes out and make them harder to diagnose. I think we simply need to try to disable DXVA and see how that affects this signature. We can always remove this assert later, -that-'s never going to have a true negative effect.
Flags: needinfo?(bas)
Flags: needinfo?(milan)
I've been hearing from folks that there is confusion around this bug.  Let me try to explain this a bit for those not entirely accustomed to graphics.

There is a phenomenon on windows called TDR which is when the graphics driver resets beneath us. This causes Firefox to crash. This bug is essentially in the catch-all area of where we land when these kinds of things happen to us.

In beta 36 we have been running two simultaneous pieces of unproven tech:
* MSE - video playback improvements over flash
* D2D 1.1 - a newish method for doing some drawing

Note that our normal hardware acceleration is provided by d3d11/9. I'm still not sure how d3d and d2d interact with each other per se (other than the obvious 2d vs 3d difference).

We have some crashes we think are D2D related.
We have some crashes that are MSE related. Now this comes back to the TDR issue because we are hypothesizing that due to the relatively untested codepath that we are using with MSE for hardware accelerated video decoding (what Bas is referring to above when he talks about DXVA) we might be pushing some graphics drivers beyond their limits and hitting TDR's. Thus causing crashes with this signature.

Of course it could also be D2D related too.

Now that we have disabled MSE in beta 8, if the spike in this signature was caused by hardware decoding as a result of MSE, we *should* see the crash rate of this issue diminish.  If we don't, then we likely have some other culprit.

Hope this helps fix some of the confusion.
> Now that we have disabled MSE in beta 8, if the spike in this signature was
> caused by hardware decoding as a result of MSE, we *should* see the crash
> rate of this issue diminish.  If we don't, then we likely have some other
> culprit.

I don't have absolute numbers but in early b8 data this is still at the same relative volume.
(In reply to Clint Talbert ( :ctalbert ) from comment #22)
> I've been hearing from folks that there is confusion around this bug.  Let
> me try to explain this a bit for those not entirely accustomed to graphics.
> 
> There is a phenomenon on windows called TDR which is when the graphics
> driver resets beneath us. This causes Firefox to crash. This bug is
> essentially in the catch-all area of where we land when these kinds of
> things happen to us.
> 
> In beta 36 we have been running two simultaneous pieces of unproven tech:
> * MSE - video playback improvements over flash
> * D2D 1.1 - a newish method for doing some drawing
> 
> Note that our normal hardware acceleration is provided by d3d11/9. I'm still
> not sure how d3d and d2d interact with each other per se (other than the
> obvious 2d vs 3d difference).
> 
> We have some crashes we think are D2D related.
> We have some crashes that are MSE related. Now this comes back to the TDR
> issue because we are hypothesizing that due to the relatively untested
> codepath that we are using with MSE for hardware accelerated video decoding
> (what Bas is referring to above when he talks about DXVA) we might be
> pushing some graphics drivers beyond their limits and hitting TDR's. Thus
> causing crashes with this signature.
> 
> Of course it could also be D2D related too.
> 
> Now that we have disabled MSE in beta 8, if the spike in this signature was
> caused by hardware decoding as a result of MSE, we *should* see the crash
> rate of this issue diminish.  If we don't, then we likely have some other
> culprit.
> 
> Hope this helps fix some of the confusion.

It should be noted we were -already- using D2D 1.0, which, on systems which have D2D 1.1, uses the -exact- same libraries as D2D 1.1 (D2D 1.1 is a superset of D2D 1.0), having said that, with D2D 1.1 we've started using a small amount of the APIs that is only in the superset, in theory, that could make some difference in TDR occurrance, but it's not the most likely cause.
Reproduced this on Windows 7, Windows 8.1 and Vista while graphics card driver update, if you need any information please needinfo me.
(In reply to Bogdan Maris, QA [:bogdan_maris] from comment #25)
> Reproduced this on Windows 7, Windows 8.1 and Vista while graphics card
> driver update, if you need any information please needinfo me.

That triggers a driver reset and is expected to trigger this on beta. On release it would trigger a different crash, but also crash. It would be nice if you could confirm on Aurora this no longer causes a crash.
(In reply to Bas Schouten (:bas.schouten) from comment #26)
> (In reply to Bogdan Maris, QA [:bogdan_maris] from comment #25)
> > Reproduced this on Windows 7, Windows 8.1 and Vista while graphics card
> > driver update, if you need any information please needinfo me.
> 
> That triggers a driver reset and is expected to trigger this on beta. On
> release it would trigger a different crash, but also crash. It would be nice
> if you could confirm on Aurora this no longer causes a crash.

Just tried using latest Aurora on various Windows operating systems and I still receive crashes, only on Vista with this signature.

Windows 8.1 64-bit
bp-3024813c-0327-43ae-9c54-df4ce2150213
bp-40471cf6-2d13-45a3-bf56-4b5ae2150213

Windows Vista 64-bit:
bp-8ccdec4e-edbf-4891-8012-9df002150213

Windows 7 32-bit:
bp-593a164a-8ff8-4e02-90a4-85b342150213
(In reply to Bogdan Maris, QA [:bogdan_maris] from comment #27)
> (In reply to Bas Schouten (:bas.schouten) from comment #26)
> > (In reply to Bogdan Maris, QA [:bogdan_maris] from comment #25)
> > > Reproduced this on Windows 7, Windows 8.1 and Vista while graphics card
> > > driver update, if you need any information please needinfo me.
> > 
> > That triggers a driver reset and is expected to trigger this on beta. On
> > release it would trigger a different crash, but also crash. It would be nice
> > if you could confirm on Aurora this no longer causes a crash.
> 
> Just tried using latest Aurora on various Windows operating systems and I
> still receive crashes, only on Vista with this signature.
> 
> Windows 8.1 64-bit
> bp-3024813c-0327-43ae-9c54-df4ce2150213
> bp-40471cf6-2d13-45a3-bf56-4b5ae2150213
> 
> Windows Vista 64-bit:
> bp-8ccdec4e-edbf-4891-8012-9df002150213
> 
> Windows 7 32-bit:
> bp-593a164a-8ff8-4e02-90a4-85b342150213

I guess we need to uplift bug 1126490 to Aurora, how about nightly?
We still have this bug in 36 beta 9, right?
(In reply to Bas Schouten (:bas.schouten) from comment #28)
> (In reply to Bogdan Maris, QA [:bogdan_maris] from comment #27)
> > (In reply to Bas Schouten (:bas.schouten) from comment #26)
> > > (In reply to Bogdan Maris, QA [:bogdan_maris] from comment #25)
> > > > Reproduced this on Windows 7, Windows 8.1 and Vista while graphics card
> > > > driver update, if you need any information please needinfo me.
> > > 
> > > That triggers a driver reset and is expected to trigger this on beta. On
> > > release it would trigger a different crash, but also crash. It would be nice
> > > if you could confirm on Aurora this no longer causes a crash.
> > 
> > Just tried using latest Aurora on various Windows operating systems and I
> > still receive crashes, only on Vista with this signature.
> > 
> > Windows 8.1 64-bit
> > bp-3024813c-0327-43ae-9c54-df4ce2150213
> > bp-40471cf6-2d13-45a3-bf56-4b5ae2150213
> > 
> > Windows Vista 64-bit:
> > bp-8ccdec4e-edbf-4891-8012-9df002150213
> > 
> > Windows 7 32-bit:
> > bp-593a164a-8ff8-4e02-90a4-85b342150213
> 
> I guess we need to uplift bug 1126490 to Aurora, how about nightly?

I get some interesting results using Nightly:

 Windows Vista 64-bit

Nightly e10s enabled
- no Firefox crash but tabs do crash and Firefox has no buttons.
'See attachment'

Nightly e10s disabled
bp-ea783782-ef12-404e-b739-379c12150216

 Windows 8.1 64-bit

Nightly e10s enabled
bp-4445bcd0-d23f-4d72-8fe5-c28f72150216

Nightly e10s disabled
- no crash.

 Windows 7 32-bit

Nightly e10s enabled
- no crash.

Nightly e10s disabled
- no crash

(In reply to Sylvestre Ledru [:sylvestre] from comment #29)
> We still have this bug in 36 beta 9, right?

Yes, 36 beta 9 is still affected, just reproduced on Windows 7 32-bit:
bp-4a2ba501-e8b0-4a4a-8139-711632150216
(In reply to Sylvestre Ledru [:sylvestre] from comment #29)
> We still have this bug in 36 beta 9, right?

This crash will occur on beta 9 some of the time when a TDR occurs (other times other crashes will happen, if D2D 1.1 is enabled usually the FillRectangle one, otherwise a large variety of other crashes). There also have been no attempts to make beta 9 resilient to driver resets (Release crashes on driver resets, although it has an even larger range of associated signatures than beta). On 37 we intend to make sure our TDR issues are mostly addressed and a driver reset should generally become survivable.
OK. So, if I understand correctly, we will ship 36 with this bug.
(In reply to Sylvestre Ledru [:sylvestre] from comment #32)
> OK. So, if I understand correctly, we will ship 36 with this bug.

Unless we can bring our TDRs down, which I still think -may- be related to DXVA (it's the only change we've made in 36 that I could see significantly affecting TDRs, and there's some reports of people being on youtube when this occurs), then yes. This in itself is not a bug, or not anything new, we've always crashed somehow when the graphics device resets. It just appears that, unless this signature is simply a consolidation of other crashes, we've increased the amount of driver resets that we caused.
Attached image TDR_awesome_crash.png
So I hit this crash with a pretty awesome TDR. Here's what happened.
1. Starting my workday, I connected to my external monitor, used skype for an earlier call and did not start vidyo
2. Started my browsers - I'm running nightly and beta each with a lot of tabs in their own different profiles. Nightly is running in E10s mode
3. To get on my next call, I started up vidyo desktop and attempted to join my vidyo room.

At this point, both the laptop screen and the external display went black, flashed their screens back on, went black again, and then flashed back on in the state captured in the screen shot. My theory is that the TDR was triggered by something vidyo did, and this caused Beta to crash. 
However, on nightly, you can see the state of it - the black background and no content. While the chrome process still works in nightly - I can switch tabs and it responds, no content loads. So the content process has likely crashed but I get NO indication of this, which is even more serious from a UX point of view.

I was also running dev edition at the time (yes, I actually run all three versions at once). And dev edition's UI was on the laptop monitor (the nightly and beta builds were displayed on the external monitor) and it escaped mostly unscathed. It did *not* crash, it still renders its content, however, the "minimize, maximize, close" icons from windows have been replaced with a solid line of color. They still work though, and if I click on them, they re-render themselves.

This is all on windows 8.1. The crash reports that were filed were:
* From beta: https://crash-stats.mozilla.com/report/index/bp-4419de6a-c784-450c-8798-8cef02150220
* There seems to be no crash reported from the nightly browser, even though it's clearly been broken.
Summary: CompositorD3D11::HandleError comping from mozilla::layers::CompositorD3D11::UpdateRenderTarget() → CompositorD3D11::HandleError coming from mozilla::layers::CompositorD3D11::UpdateRenderTarget() probably from TDRs
(In reply to Clint Talbert ( :ctalbert ) from comment #34)
> Created attachment 8567162 [details]
> TDR_awesome_crash.png
> 
> So I hit this crash with a pretty awesome TDR. Here's what happened.
> 1. Starting my workday, I connected to my external monitor, used skype for
> an earlier call and did not start vidyo
> 2. Started my browsers - I'm running nightly and beta each with a lot of
> tabs in their own different profiles. Nightly is running in E10s mode
> 3. To get on my next call, I started up vidyo desktop and attempted to join
> my vidyo room.
> 
> At this point, both the laptop screen and the external display went black,
> flashed their screens back on, went black again, and then flashed back on in
> the state captured in the screen shot. My theory is that the TDR was
> triggered by something vidyo did, and this caused Beta to crash. 
> However, on nightly, you can see the state of it - the black background and
> no content. While the chrome process still works in nightly - I can switch
> tabs and it responds, no content loads. So the content process has likely
> crashed but I get NO indication of this, which is even more serious from a
> UX point of view.
> 
> I was also running dev edition at the time (yes, I actually run all three
> versions at once). And dev edition's UI was on the laptop monitor (the
> nightly and beta builds were displayed on the external monitor) and it
> escaped mostly unscathed. It did *not* crash, it still renders its content,
> however, the "minimize, maximize, close" icons from windows have been
> replaced with a solid line of color. They still work though, and if I click
> on them, they re-render themselves.
> 
> This is all on windows 8.1. The crash reports that were filed were:
> * From beta:
> https://crash-stats.mozilla.com/report/index/bp-4419de6a-c784-450c-8798-
> 8cef02150220
> * There seems to be no crash reported from the nightly browser, even though
> it's clearly been broken.

Can you file a separate bug for the issue you had on nightly?
It is the top browser crasher for YouTube in release but it is dwarfed by Flash crashes. It is the top crasher in Firefox 37b2 but is in 2nd place on YouTube. The competitor for first place is OOM. We have a few things that improve the OOM situation.

What needs to happen to resolve this issue?
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #36)
> It is the top browser crasher for YouTube in release but it is dwarfed by
> Flash crashes. It is the top crasher in Firefox 37b2 but is in 2nd place on
> YouTube. The competitor for first place is OOM. We have a few things that
> improve the OOM situation.
> 
> What needs to happen to resolve this issue?

We need to diagnose if these are TDRs triggered by youtube issues. As I've suggested in several different forums, A/B testing on a channel with a sufficient population, where we use acceleration in one case for youtube and no acceleration in the other, is the best method of diagnosis.
I personally do not think there is any time to do experimentation or A/B testing before we intend to ship MSE as the final beta of this cycle is going to build in two weeks.
(In reply to Bas Schouten (:bas.schouten) from comment #37)
> (In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #36)
> > It is the top browser crasher for YouTube in release but it is dwarfed by
> > Flash crashes. It is the top crasher in Firefox 37b2 but is in 2nd place on
> > YouTube. The competitor for first place is OOM. We have a few things that
> > improve the OOM situation.
> > 
> > What needs to happen to resolve this issue?
> 
> We need to diagnose if these are TDRs triggered by youtube issues. As I've
> suggested in several different forums, A/B testing on a channel with a
> sufficient population, where we use acceleration in one case for youtube and
> no acceleration in the other, is the best method of diagnosis.

It contributes to a slightly smaller proportion of the crashes on YouTube than in the web at large in beta 37. This means we don't have anything to support the hypothesis that hardware decoding (or even video more generally) is making the problem worse.

I'm assuming accelerated layers is correlated with TDRs on the basis that we're simply not doing much in the GPU.

What is involved in recovering from a driver reset? Do we need to tear down all of our layers, images, video frames, etc. and regenerate them? I'm guessing that would be an involved process and we'd end up being at a loose end for canvas and the like.
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #39)
> (In reply to Bas Schouten (:bas.schouten) from comment #37)
> > (In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #36)
> > > It is the top browser crasher for YouTube in release but it is dwarfed by
> > > Flash crashes. It is the top crasher in Firefox 37b2 but is in 2nd place on
> > > YouTube. The competitor for first place is OOM. We have a few things that
> > > improve the OOM situation.
> > > 
> > > What needs to happen to resolve this issue?
> > 
> > We need to diagnose if these are TDRs triggered by youtube issues. As I've
> > suggested in several different forums, A/B testing on a channel with a
> > sufficient population, where we use acceleration in one case for youtube and
> > no acceleration in the other, is the best method of diagnosis.
> 
> It contributes to a slightly smaller proportion of the crashes on YouTube
> than in the web at large in beta 37. This means we don't have anything to
> support the hypothesis that hardware decoding (or even video more generally)
> is making the problem worse.
> 
> I'm assuming accelerated layers is correlated with TDRs on the basis that
> we're simply not doing much in the GPU.
> 
> What is involved in recovering from a driver reset? Do we need to tear down
> all of our layers, images, video frames, etc. and regenerate them? I'm
> guessing that would be an involved process and we'd end up being at a loose
> end for canvas and the like.

That code all exists, we -should- be able to recover from a TDR on nightly but it's very, very tricky (indeed, canvas is just screwed). I've added telemetry data to attempt to detect TDRs, the first bits of data on that should be in and should provide us with some information as to what is causing the TDRs.

In general though triggering a driver reset is a -very- bad end-user experience, some programs don't deal with it and crash, all screens flicker black, etc. etc. So we need to really remove the cause of the increase in TDRs we're seeing. If it's not DXVA that's causing it, I'm not sure what could be causing an increase, we could try switching off D2D 1.1 again to see if that brings it down, of course, to see if something about D2D 1.1 is causing TDRs (which seems unlikely but is not impossible).
There is definitely something odd going on with the crashes http://tiny.cc/o4l7ux - most builds have very few crashes with the occasional build going ballistic.
I'm not sure I trust that table. It looks like it has all channels. It makes sense for the releases and betas to be higher. But it claims version is "36" which doesn't make sense for nightlies and auroras in 2015-03. Hmm.
(In reply to David Major [:dmajor] (UTC+13) from comment #42)
> I'm not sure I trust that table. It looks like it has all channels. It makes
> sense for the releases and betas to be higher. But it claims version is "36"
> which doesn't make sense for nightlies and auroras in 2015-03. Hmm.

Maybe I should leave the crash analysis to the experts, eh!
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #41)
> There is definitely something odd going on with the crashes
> http://tiny.cc/o4l7ux - most builds have very few crashes with the
> occasional build going ballistic.

That table is bogus. It adds up everything seen for all builds from any channel. Given that we build two beta builds a week and Nightly/DevEdition have way fewer users, it's pretty clear that those numbers will fluctuate wildly. There is a bug open to fix this but it seems like nobody is willing to work on it.
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #43)
> Maybe I should leave the crash analysis to the experts, eh!

It shouldn't be that way. That said, there are a few idiosyncrasies and traps, that this is one of them. Bug 898432 is filed on that one.
Although this is a top crash, as Bas said multiple times in this bug, the underlying crashes have been around for a while but are now consolidated and may be triggered more often. Given that we have not made progress on this recently, I'm doubtful that we'll be able to produce a fix before 37 ships. I'm marking this bug as wontfix for 37.

Bas - Do you have enough information to continue to investigate this bug? What does the Telemetry data show?
Flags: needinfo?(bas)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #46)
> Although this is a top crash, as Bas said multiple times in this bug, the
> underlying crashes have been around for a while but are now consolidated and
> may be triggered more often. Given that we have not made progress on this
> recently, I'm doubtful that we'll be able to produce a fix before 37 ships.
> I'm marking this bug as wontfix for 37.
> 
> Bas - Do you have enough information to continue to investigate this bug?
> What does the Telemetry data show?

Telemetry data shows we're TDR-ing a lot, which is basically not unexpected. But in reality there's just two things we can do here to check whether -why- we're TDR-ing more and if we can easily reduce it:

1. Disable DXVA, see what happens to TDR rates.
2. Disable D2D 1.1, see what happens to TDR rates.
Flags: needinfo?(bas)
(In reply to Bas Schouten (:bas.schouten) from comment #47)
> Telemetry data shows we're TDR-ing a lot, which is basically not unexpected.
> But in reality there's just two things we can do here to check whether -why-
> we're TDR-ing more and if we can easily reduce it:
> 
> 1. Disable DXVA, see what happens to TDR rates.
> 2. Disable D2D 1.1, see what happens to TDR rates.

Just to be clear, are you planning to try those experiments?
A user is able to reproduce the same crash in bug 1145143.
(In reply to :dmajor (semi-away, use needinfo) from comment #48)
> (In reply to Bas Schouten (:bas.schouten) from comment #47)
> > Telemetry data shows we're TDR-ing a lot, which is basically not unexpected.
> > But in reality there's just two things we can do here to check whether -why-
> > we're TDR-ing more and if we can easily reduce it:
> > 
> > 1. Disable DXVA, see what happens to TDR rates.
> > 2. Disable D2D 1.1, see what happens to TDR rates.
> 
> Just to be clear, are you planning to try those experiments?

I'm not just going to push stuff to Beta :) Nor would I know when to do it and how to analyze the data. I've suggested this on this bug as well as in at least 2 e-mail threads. But there wasn't much of a reply.
(In reply to Loic from comment #49)
> A user is able to reproduce the same crash in bug 1145143.

More or less unrelated, yes, if something causes a driver crash firefox crashes with this bug. The trick here is figuring out what, for most people, causes the driver crash. And that particular user's use pattern is not going to be it.
This signature has about tripled on the 37 (beta) train over the last few days, in 37.0b7 it's actually #1 in front of the OOM|small signature now.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #52)
> This signature has about tripled on the 37 (beta) train over the last few
> days, in 37.0b7 it's actually #1 in front of the OOM|small signature now.

Has something occurred that might have caused people to be watching more videos or something along those lines? That's the only change I could think of that would occur without us doing anything and that might cause this. Another option I suppose if some buggy driver update was pushed, put the adoption of those really never is that fast.
(In reply to Bas Schouten (:bas.schouten) from comment #53)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #52)
> > This signature has about tripled on the 37 (beta) train over the last few
> > days, in 37.0b7 it's actually #1 in front of the OOM|small signature now.
> 
> Has something occurred that might have caused people to be watching more
> videos or something along those lines?

With bug 1138967, we upload textures in the ImageBridge thread and share them with DXGI, rather than doing the upload on the compositor side. This change was uplifted in the last beta and the only video-related thing I can thing of that made it to beta lately.
(In reply to Bas Schouten (:bas.schouten) from comment #53)
> Another option I suppose if some buggy driver update was pushed, put the
> adoption of those really never is that fast.

I that case, we would see it across channels and builds, but it looks like this is isolated to 37.0b7. What Nical points to is much more likely as the issue.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #55)
> (In reply to Bas Schouten (:bas.schouten) from comment #53)
> > Another option I suppose if some buggy driver update was pushed, put the
> > adoption of those really never is that fast.
> 
> I that case, we would see it across channels and builds, but it looks like
> this is isolated to 37.0b7. What Nical points to is much more likely as the
> issue.

Right, a change like that in the realm of video could certainly cause this by increasing the amount of driver crashes somehow. I don't keep a close eye on video changes, but the bug Nical points at will certainly affect what driver codepaths we're hitting.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #52)
> This signature has about tripled on the 37 (beta) train over the last few
> days, in 37.0b7 it's actually #1 in front of the OOM|small signature now.

Where can I find this information?

Looking at the report list linked from the crash report in comment #0, it appears that the total number of crashes with this signature is about the same as other betas (3121 for b7, 3509 for b6, 3866 for b5 etc).

Is it possible to get a list of reports only for a specific version? I can't find a way to do that search.

I'm particularly interested in looking at the Graphics Adapter Report for these lists, so I can see if the increase in crashes correlates with a specific graphics card.

It appears that 'PCI\VEN_8086&DEV_2E32&SUBSYS_31031565&REV_03 Intel G41 express graphics' has spiked massively (from 2% -> 9%) in the last 3 days vs the last 28.

Does that card alone explain the rise in these crashes, or are we really seeing an increase across the board? It's really hard to tell from the relative percentages. Average number of crashes/day for each given device id for each version would be the easiest metric to compare these with I think.

Bug 1146313 also appears to correlate really strongly (>80%) with that same device id.
Flags: needinfo?(kairo)
(In reply to Matt Woodrow (:mattwoodrow) from comment #57)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #52)
> > This signature has about tripled on the 37 (beta) train over the last few
> > days, in 37.0b7 it's actually #1 in front of the OOM|small signature now.
> 
> Where can I find this information?

https://crash-analysis.mozilla.com/rkaiser/2015-03-23/2015-03-23.firefox.37.explosiveness.html is a report that looks at the crash rates (crashes / 1M ADI) for signatures on the 37 train in total. When you find that signatures there, right of the explosiveness factor columns (which are the result of some stats to tell how fast they are rising), you will find the rates for this signature on various days, here is simplified fashion:

   03-23   03-22   03-21   03-20   03-19   03-18   03-17   03-16   03-15   03-14   03-13
     960    1316    1043     549     541     629     461     406     528     473     449

I guessed the "tripled" with not having 03-23 data yet, looks like on a weekday it looks more like we doubled.

The "#1" thing comes from looking at https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/37.0b7

If you compare the percentages of this signature vs. OOM|small to https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/37.0b6 (and to https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/36.0.4 to determine if an external factor has made it spike for everyone) then you'll see that it's definitely higher in b7 than anywhere else.

> Is it possible to get a list of reports only for a specific version? I can't
> find a way to do that search.

The topcrash list for 37.0b7 I linked above should give you that link: https://crash-stats.mozilla.com/report/list?product=Firefox&range_value=7&range_unit=days&date=2015-03-24&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AHandleError%28long%2C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ASeverity%29+%7C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AFailed%28long%2C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ASeverity%29+%7C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AUpdateRenderTarget%28%29&version=Firefox%3A37.0b7

This should contain the graphics adapter report.

You could also do an equivalent search and get to https://crash-stats.mozilla.com/signature/?build_id=20150319212106&product=Firefox&release_channel=beta&process_type=browser&process_type=content&version=37.0&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AHandleError%28long%2C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ASeverity%29+|+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AFailed%28long%2C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ASeverity%29+|+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AUpdateRenderTarget%28%29&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1 which lets you look at stats for any annotation/field in the "Aggregations" section.

> Does that card alone explain the rise in these crashes, or are we really
> seeing an increase across the board?

That's a bit hard to determine, one would need to do some math on that based on the numbers above.

> Bug 1146313 also appears to correlate really strongly (>80%) with that same
> device id.

If it's that ID in general, that's surely an interesting find. Is there something we could do on that specifically?
Flags: needinfo?(kairo)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58)
> > Is it possible to get a list of reports only for a specific version? I can't
> > find a way to do that search.
> 
> The topcrash list for 37.0b7 I linked above should give you that link:
> https://crash-stats.mozilla.com/report/
> list?product=Firefox&range_value=7&range_unit=days&date=2015-03-
> 24&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AHandleError%28long
> %2C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ASeverity%29+%7C+mozilla%3A%
> 3Alayers%3A%3ACompositorD3D11%3A%3AFailed%28long%2C+mozilla%3A%3Alayers%3A%3A
> CompositorD3D11%3A%3ASeverity%29+%7C+mozilla%3A%3Alayers%3A%3ACompositorD3D11
> %3A%3AUpdateRenderTarget%28%29&version=Firefox%3A37.0b7
> 
> This should contain the graphics adapter report.

I don't know how much I trust the above URL. It says it's only for 37.0b7 but if you look at the product breakdown on that page it has all sorts of different versions.

> You could also do an equivalent search and get to
> https://crash-stats.mozilla.com/signature/
> ?build_id=20150319212106&product=Firefox&release_channel=beta&process_type=br
> owser&process_type=content&version=37.
> 0&signature=mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3AHandleError%28long%
> 2C+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3ASeverity%29+|+mozilla%3A%3Al
> ayers%3A%3ACompositorD3D11%3A%3AFailed%28long%2C+mozilla%3A%3Alayers%3A%3ACom
> positorD3D11%3A%3ASeverity%29+|+mozilla%3A%3Alayers%3A%3ACompositorD3D11%3A%3
> AUpdateRenderTarget%28%29&_columns=date&_columns=product&_columns=version&_co
> lumns=build_id&_columns=platform&_columns=reason&_columns=address&page=1
> which lets you look at stats for any annotation/field in the "Aggregations"
> section.

This seems more useful. For b6 it looks like the top-hit adapter device ids are:

Rank   Adapter device id   Count   %
1      0x0166              342     8.92 %
2      0x0116              341     8.90 %
3      0x0046              249     6.50 %
4      0x0102              244     6.37 %
5      0x0a16              181     4.72 %
6      0x0106              155     4.04 %
7      0x2e32              139     3.63 %
...
24     0x2a02               25     0.65 %
...

and for b7:

Rank    Adapter device id  Count   %
1       0x2e32             2783    41.98 %
2       0x2a02             714     10.77 %
3       0x0116             268     4.04 %
4       0x0046             260     3.92 %
5       0x0166             225     3.39 %
6       0x0106             179     2.70 %
...

so it looks like adapter ids 0x2e32 accounts for most of the spike, and 0x2a02 also seems to have increased significantly.
(Note: I got the b6 list by using the link kairo provided and changing the buildid in the search parameters to 20150316202753)
I also grepped my way through the raw crash data from the 19th to the 23rd (because I still don't fully trust the crash-stats web interface) and posted the relevant data to http://people.mozilla.org/~kgupta/bug/1116812/. b6-devices.txt and b7-devices.txt show the number of HandleError crashes (at the start of the line) on b6 and b7 grouped by { AdapterVendorID, AdapterDeviceID, AdapterSubsysID, AdapterDriverVersion }. b6-devices-nosubsys.txt and b7-devices-nosubsys.txt are the same but I stripped out the AdapterSubsysID since it seemed pretty noisy.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #59)
> I don't know how much I trust the above URL. It says it's only for 37.0b7
> but if you look at the product breakdown on that page it has all sorts of
> different versions.

The product breakdown is the one thing on the Signature Summary that always is without version filters, as otherwise it wouldn't be too useful in a case like this. Might make sense to flag that in some way in the UI, though.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #59)
> so it looks like adapter ids 0x2e32 accounts for most of the spike, and
> 0x2a02 also seems to have increased significantly.

That seem to be "Intel G41 express graphics" and "Intel GM965, Intel X3100".

See http://www.pcidatabase.com/search.php?device_search_str=0x2e32&device_search=Search and http://www.pcidatabase.com/search.php?device_search_str=0x2a02&device_search=Search
Also added b6-idonly.txt and b7-idonly.txt which are grouped by { AdapterVendorID, AdapterDeviceID }. The numbers seem to agree pretty closely with crash-stats web interface, that adapter ids 0x2e32 and 0x2a02 seem to be affected the most.
Awesome, thanks for that!

(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58)

> If it's that ID in general, that's surely an interesting find. Is there
> something we could do on that specifically?

Yeah, we can blacklist those devices so that they no longer get accelerated layers, or we could add a new blacklist type to specifically avoid this crash.

They're both pretty old cards, blacklisting them entirely doesn't seem like a big deal.
(In reply to Matt Woodrow (:mattwoodrow) from comment #65)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #58)
> > If it's that ID in general, that's surely an interesting find. Is there
> > something we could do on that specifically?
> 
> Yeah, we can blacklist those devices so that they no longer get accelerated
> layers, or we could add a new blacklist type to specifically avoid this
> crash.
> 
> They're both pretty old cards, blacklisting them entirely doesn't seem like
> a big deal.

Sounds like something to do for 38, then, I guess.
Is it possible to get data on driver versions for the two affected devices?

It would be nice to blacklist only a certain range of driver versions, rather than the entire device.
http://people.mozilla.org/~kgupta/bug/1116812/b7-devices-nosubsys.txt has driver versions as well. If you have a more specific query you'd like data for let me know and I can probably extract it.
Comment on attachment 8582839 [details] [diff] [review]
Blacklist the two devices that spiked with 37b7

Review of attachment 8582839 [details] [diff] [review]:
-----------------------------------------------------------------

I guess so. But let's see if we can figure out what's going on here.
Attachment #8582839 - Flags: review?(bas) → review+
Do you have any suggestions for how we might do that? Do we have any of the affected devices in the TO office?
By the STR of Bug 1145102 Comment 4, we could cause a lot of types of crashes. CompositorD3D11::HandleError also seemed to happen on lonovo W530 and Inspiron 5547.
Comment on attachment 8582839 [details] [diff] [review]
Blacklist the two devices that spiked with 37b7

Review of attachment 8582839 [details] [diff] [review]:
-----------------------------------------------------------------

Hrm, we should -really- only blacklist video for these devices longer term since everything else seems to be okay. Can you make sure we create a way for only blacklisting video?
https://hg.mozilla.org/mozilla-central/rev/2118109cc0e2
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
The checkin does not fix the complete bug but probably only fixes the additional spike we saw in 37.0b7.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #77)
> The checkin does not fix the complete bug but probably only fixes the
> additional spike we saw in 37.0b7.

Kairo, when will we have enough data to check this?
Flags: needinfo?(kairo)
(In reply to Milan Sreckovic [:milan] from comment #79)
> (In reply to Robert Kaiser (:kairo@mozilla.com) from comment #77)
> > The checkin does not fix the complete bug but probably only fixes the
> > additional spike we saw in 37.0b7.
> 
> Kairo, when will we have enough data to check this?

Probably once we have this patch on beta. Right now it only just merged to aurora, from what I can tell. That said, the signature will still be the #2 topcrash after this patch is on beta.
Flags: needinfo?(kairo)
Crash Signature: [@ mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::UpdateRenderTarget()] → [@ mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::UpdateRenderTarget()] [@ m…
Crash Signature: , mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozil... ] → , mozilla::layers::CompositorD3D11::Severity) | mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozil... ] [@ xul.dll@0xab50e6 | xul.dll@0xab3bc3]
Crash after click PPM mouse on link to the YT movie
https://crash-stats.mozilla.com/report/index/0d563e1a-13a0-4917-99db-5a7d12150409
Matt, can we have an uplift request to 38? Thanks
Flags: needinfo?(matt.woodrow)
Comment on attachment 8582839 [details] [diff] [review]
Blacklist the two devices that spiked with 37b7

Approval Request Comment
[Feature/regressing bug #]: HTML5 Video
[User impact if declined]: Crashes on some device.
[Describe test coverage new/current, TreeHerder]: None
[Risks and why]: Low, simple blacklist change.
[String/UUID change made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8582839 - Flags: approval-mozilla-beta?
#8 crash for TB38.0b1, so this will also benefit Thunderbird
Whiteboard: [tbird crash]
Attachment #8582839 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Should be in 38 beta 4.
Time for my $0.02: I just hope the blaclist check-in is or was intended temporary and will be backed out before reaching release, though that would be exceptional. So is it permanent?
This is causing us to use WARP on these devices which seems to cause us to not invalidate properly giving us stale content.
"This" meaning the bug or the fix/blacklist?
Keywords: topcrash-win
Bas, this is the most important crash in 38. Could you help on this? Thanks
(In reply to Sylvestre Ledru [:sylvestre] from comment #90)
> Bas, this is the most important crash in 38. Could you help on this? Thanks

Are we talking about TDRs being the most important crash?

If so, is there a rise relative to 37?

As I've said in numerous places before, there appears to be a rise in TDRs lately, and there is a decent amount of evidence that this is related to video in a bunch of cases. I've raised bug 1157764 for this. We're also analogous to this trying to make TDRs survivable, but that is not something we will be able to uplift.
Flags: needinfo?(bas)
We're trying to reduce the number of TDRs in bug 1157764, which we're planning on uplifting up to 38.
Comment on attachment 8599029 [details] [diff] [review]
Consider DXGI_ERROR_INVALID_CALL a recoverable error for GetBuffer

Review of attachment 8599029 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/CompositorD3D11.cpp
@@ +1224,5 @@
> +  if (hr == DXGI_ERROR_INVALID_CALL) {
> +    // This happens on some GPUs/drivers when there's a TDR.
> +    gfxCriticalError() << "GetBuffer returned invalid call!";
> +    return;
> +  }

Does this cause us to reset the device or stop rendering?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #94)
> Comment on attachment 8599029 [details] [diff] [review]
> Consider DXGI_ERROR_INVALID_CALL a recoverable error for GetBuffer
> 
> Review of attachment 8599029 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: gfx/layers/d3d11/CompositorD3D11.cpp
> @@ +1224,5 @@
> > +  if (hr == DXGI_ERROR_INVALID_CALL) {
> > +    // This happens on some GPUs/drivers when there's a TDR.
> > +    gfxCriticalError() << "GetBuffer returned invalid call!";
> > +    return;
> > +  }
> 
> Does this cause us to reset the device or stop rendering?

If DidRenderingDeviceReset returns an error as well, yes (which presumably, it does).
(In reply to Bas Schouten (:bas.schouten) from comment #95)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #94)
> > Comment on attachment 8599029 [details] [diff] [review]
> > Consider DXGI_ERROR_INVALID_CALL a recoverable error for GetBuffer
> > 
> > Review of attachment 8599029 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: gfx/layers/d3d11/CompositorD3D11.cpp
> > @@ +1224,5 @@
> > > +  if (hr == DXGI_ERROR_INVALID_CALL) {
> > > +    // This happens on some GPUs/drivers when there's a TDR.
> > > +    gfxCriticalError() << "GetBuffer returned invalid call!";
> > > +    return;
> > > +  }
> > 
> > Does this cause us to reset the device or stop rendering?
> 
> If DidRenderingDeviceReset returns an error as well, yes (which presumably,
> it does).

We should check if DidRenderingDeviceReset here and if we don't have a reset we should continue to crash.
Attachment #8599029 - Flags: review?(jmuizelaar) → review-
We need to stop TDRs, I'm not convinced we can recover from too many of these.  This is bug 1157764
Attachment #8599377 - Flags: review?(jmuizelaar) → review+
Bas, could you fill the uplift request to 38 ? I guess we want this... Thanks
Flags: needinfo?(bas)
Comment on attachment 8599377 [details] [diff] [review]
Consider DXGI_ERROR_INVALID_CALL a recoverable error for GetBuffer and make sure we check the correct device

High volume crash.  Another guard against it.
Attachment #8599377 - Flags: approval-mozilla-beta?
Attachment #8599377 - Flags: approval-mozilla-aurora?
Comment on attachment 8599377 [details] [diff] [review]
Consider DXGI_ERROR_INVALID_CALL a recoverable error for GetBuffer and make sure we check the correct device

[Triage Comment]
Should be in 38 RC1
Attachment #8599377 - Flags: approval-mozilla-release+
Attachment #8599377 - Flags: approval-mozilla-beta?
Attachment #8599377 - Flags: approval-mozilla-aurora?
Attachment #8599377 - Flags: approval-mozilla-aurora+
(In reply to Milan Sreckovic [:milan] from comment #100)
> Also, Bas:
> http://hg.mozilla.org/mozilla-central/annotate/caf25344f73e/gfx/layers/d3d11/
> TextureD3D11.cpp#l1028 may return device as nullptr, based on one of the
> FinalizeFrame crashes -
> https://crash-stats.mozilla.com/report/index/095c5cb7-0806-4c99-937c-
> 3100d2150428.  Something we should deal with?

Yes, we probably do.
Flags: needinfo?(bas)
Target Milestone: mozilla39 → ---
https://hg.mozilla.org/mozilla-central/rev/b7d29990d645
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
¡Hola Bas!

Is this really fixed on 39?

Please see below.

¡Gracias!

bp-f485d360-9ae9-4b6f-9f21-362502150606
	06/06/2015	05:44 p.m.

Crashing Thread
Frame 	Module 	Signature 	Source
0 	xul.dll 	mozilla::layers::CompositorD3D11::HandleError(long, mozilla::layers::CompositorD3D11::Severity) 	gfx/layers/d3d11/CompositorD3D11.cpp
1 	xul.dll 	mozilla::layers::CompositorD3D11::Failed(long, mozilla::layers::CompositorD3D11::Severity) 	gfx/layers/d3d11/CompositorD3D11.cpp
2 	xul.dll 	mozilla::layers::CompositorD3D11::UpdateRenderTarget() 	gfx/layers/d3d11/CompositorD3D11.cpp
3 	xul.dll 	mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) 	gfx/layers/d3d11/CompositorD3D11.cpp
4 	xul.dll 	mozilla::layers::LayerManagerComposite::Render() 	gfx/layers/composite/LayerManagerComposite.cpp
5 	xul.dll 	mozilla::layers::LayerManagerComposite::EndTransaction(void (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) 	gfx/layers/composite/LayerManagerComposite.cpp
Flags: needinfo?(bas)
(In reply to alex_mayorga from comment #110)
> ¡Hola Bas!
> 
> Is this really fixed on 39?
> 
> Please see below.
> 
> ¡Gracias!
> 
> bp-f485d360-9ae9-4b6f-9f21-362502150606
> 	06/06/2015	05:44 p.m.
> 
> Crashing Thread
> Frame 	Module 	Signature 	Source
> 0 	xul.dll 	mozilla::layers::CompositorD3D11::HandleError(long,
> mozilla::layers::CompositorD3D11::Severity) 
> gfx/layers/d3d11/CompositorD3D11.cpp
> 1 	xul.dll 	mozilla::layers::CompositorD3D11::Failed(long,
> mozilla::layers::CompositorD3D11::Severity) 
> gfx/layers/d3d11/CompositorD3D11.cpp
> 2 	xul.dll 	mozilla::layers::CompositorD3D11::UpdateRenderTarget() 
> gfx/layers/d3d11/CompositorD3D11.cpp
> 3 	xul.dll 	mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&,
> mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*,
> mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&,
> mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*,
> mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) 
> gfx/layers/d3d11/CompositorD3D11.cpp
> 4 	xul.dll 	mozilla::layers::LayerManagerComposite::Render() 
> gfx/layers/composite/LayerManagerComposite.cpp
> 5 	xul.dll 	mozilla::layers::LayerManagerComposite::EndTransaction(void
> (*)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&,
> mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*,
> mozilla::layers::LayerManager::EndTransactionFlags) 
> gfx/layers/composite/LayerManagerComposite.cpp

It looks like you were running very low on memory there, low memory situations can still trigger this crash.
Flags: needinfo?(bas)
Which may very well be tracked in bug 1172351.
¡Hola Bas!

This still happens on 40:

bp-3ad85bf4-1ed6-4872-99a0-04bd12150804
	04/08/2015	10:13 a.m.

Shall I file a new bug, reopen this one or just let this be?
Flags: needinfo?(bas)
(In reply to alex_mayorga from comment #113)
> ¡Hola Bas!
> 
> This still happens on 40:
> 
> bp-3ad85bf4-1ed6-4872-99a0-04bd12150804
> 	04/08/2015	10:13 a.m.
> 
> Shall I file a new bug, reopen this one or just let this be?

There should still be an open bug tracking the remaining crashes somewhere.
Flags: needinfo?(bas)
You need to log in before you can comment on or make changes to this bug.