Closed Bug 1146313 Opened 9 years ago Closed 9 years ago

crash in mozilla::layers::CompositorD3D11::UpdateConstantBuffers()

Categories

(Core :: Graphics, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 --- wontfix
firefox40 --- wontfix

People

(Reporter: kairo, Assigned: mattwoodrow)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-8c45571f-fdd1-43fb-94bf-cc7d72150323.
=============================================================

This is a new topcrash in 37.0b7 (it exists at very low level in other versions, but only spiked with this one).

There seem to be a few different stacks, next to the one above also those in bp-7ac3a62e-5d81-49fc-98e6-6277f2150323 or bp-c0e3f013-cb56-4b4a-a654-27ce42150323.

The top few frames are either this:
0 	xul.dll 	mozilla::layers::CompositorD3D11::UpdateConstantBuffers() 	gfx/layers/d3d11/CompositorD3D11.cpp
1 	xul.dll 	mozilla::layers::CompositorD3D11::DrawQuad(mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::layers::EffectChain const&, float, mozilla::gfx::Matrix4x4 const&) 	gfx/layers/d3d11/CompositorD3D11.cpp
2 	xul.dll 	mozilla::layers::ContentHostTexture::Composite(mozilla::layers::EffectChain&, float, mozilla::gfx::Matrix4x4 const&, mozilla::gfx::Filter const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, nsIntRegion const*) 	gfx/layers/composite/ContentHost.cpp
3 	xul.dll 	mozilla::layers::PaintedLayerComposite::RenderLayer(nsIntRect const&) 	gfx/layers/composite/PaintedLayerComposite.cpp
[...]

Or this:
0 	xul.dll 	mozilla::layers::CompositorD3D11::UpdateConstantBuffers() 	gfx/layers/d3d11/CompositorD3D11.cpp
1 	xul.dll 	mozilla::layers::CompositorD3D11::ClearRect(mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&) 	gfx/layers/d3d11/CompositorD3D11.cpp
2 	xul.dll 	mozilla::layers::CompositorD3D11::BeginFrame(nsIntRegion const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits> const&, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*, mozilla::gfx::RectTyped<mozilla::gfx::UnknownUnits>*) 	gfx/layers/d3d11/CompositorD3D11.cpp
[...]


The crashes are mainly on Win7 and are all EXCEPTION_ACCESS_VIOLATION_WRITE with non-null, mostly high addresses.

Most graphics adapters are

Click the link in the Crash Signature field of this bug to get more reports and stats.
[Tracking Requested - why for this release]:
This is #4 with 2.8% of all 37.0b7 crashes (and b7 seems to have a higher crash rate than b6 before).

Bas, nical: Any idea what's up here?
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bas)
Also note that this is the #1 (16%) of crashes in 37.0b7 with YouTube in the URL.
As Kairo said, this is not a new crash in 37 but is certainly much more explosive.

We're scheduled to build the 37 desktop RC today. ni kats and Jeff to help as well as we have very little time to figure this out. Is there something that we can backout?
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(bugmail.mozilla)
I have no idea what could have caused this. It seems like it should only happen with a driver bug. It looks like it is happening more on Intel cards. Can we get some correlation information on devices/drivers?
Flags: needinfo?(jmuizelaar) → needinfo?(kairo)
(In reply to Jeff Muizelaar [:jrmuizel] from comment #4)
> I have no idea what could have caused this. It seems like it should only
> happen with a driver bug. It looks like it is happening more on Intel cards.
> Can we get some correlation information on devices/drivers?

You can get that in detail from the Signature Summary in the link on the Crash Signature field, but I see I didn't finish the sentence I started in comment #0 about the adapters, sorry.

What I wanted to say there is: "Most graphics adapters are Intel, but there are AND and NVidia adapters in the mix as well."
Flags: needinfo?(kairo)
From a random sampling of the crash stacks it looks like some are crashing at [1] and some are crashing at [2]. I don't know this code at all but clearly the mContext->Map call is not populating resource.pData properly and we are expecting it to. Considering the crash address is nonzero I don't think we can check for null to guard against this, it seems to be putting some unwritable address there. As far as I can tell from my random sampling there doesn't appear to be a correlation in the actual crash address, or any of the memory stats (total/available memory/page file/physical memory/etc.)

It also doesn't look like this code was modified directly between b6 and b7 so I'm not sure what we could backout to fix this; it must be fallout from some other change.

[1] http://hg.mozilla.org/releases/mozilla-beta/annotate/790546ceb89f/gfx/layers/d3d11/CompositorD3D11.cpp#l1362
[2] http://hg.mozilla.org/releases/mozilla-beta/annotate/790546ceb89f/gfx/layers/d3d11/CompositorD3D11.cpp#l1355
Flags: needinfo?(bugmail.mozilla)
Here's a list of all the app notes extracted from the raw crash data on 2015-03-23. I ran it through sort | uniq -c | sort -rn to get a list sorted by frequency (the number at the start the line is the number of occurrences).
FWIW, I pretty much suspect that this could be fallout from bug 1138967, which was the most risky gfx patch we took specifically for 37.0b7. Would that patch change the cases you talk about in comment #6?
It looks like the most likely candidate, but again I'm not familiar with that code so I can't say for sure.
kats - We're going to need to deal with this in 37. Who is more familiar with the code and can determine whether we need to backout bug 1138967?
Flags: needinfo?(bugmail.mozilla)
Matt would probably the person. On IRC he said "I'm working on it"
Flags: needinfo?(bugmail.mozilla) → needinfo?(matt.woodrow)
pData isn't initialized by us, so I guess it's possible that Map() is returning S_OK, but not setting pData.

We could initialize it to null and then check for that as well as the HRESULT.

That doesn't explain why this spiked, but I agree that bug 1138967 is the most likely.

I'll probably back out part 3 of that bug soon, given the number of regressions.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(nical.bugzilla)
Parking with matt for now as he's looking into it.
Assignee: nobody → matt.woodrow
Flags: needinfo?(bas)
Blacklisting the main driver here should get this back the previously low levels, but it shouldn't hurt to avoid crashing anyway.

It looks like the Map call is returning S_OK, but not setting pData.
Attachment #8582836 - Flags: review?(bas)
Comment on attachment 8582836 [details] [diff] [review]
Avoid crashing in UpdateConstantBuffers

Review of attachment 8582836 [details] [diff] [review]:
-----------------------------------------------------------------

This shouldn't make a difference, but you should add a gfxCriticalError in the case where things go unexpected.
Attachment #8582836 - Flags: review?(bas) → review+
Comment on attachment 8582836 [details] [diff] [review]
Avoid crashing in UpdateConstantBuffers

Review of attachment 8582836 [details] [diff] [review]:
-----------------------------------------------------------------

This is insane. But let's do it.
https://hg.mozilla.org/mozilla-central/rev/05dcd4a98b97
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
This is back to larger volume in Dev Editon 39 builds of April 9 - 11. I'm not reopening right now as I don't see reports on the April 12 builds, though.
This no longer looks like a significant enough crash to fix in 37. I have marked 37 as wontfix. 38 is marked as affected. Does this need to be uplifted to Beta?
Flags: needinfo?(matt.woodrow)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #20)
> This no longer looks like a significant enough crash to fix in 37. I have
> marked 37 as wontfix. 38 is marked as affected. Does this need to be
> uplifted to Beta?

No, it also is of no significant volume in 38. It only is back to larger levels on 39.
This is definitely back on aurora 39, and on nightly 40 as well. It looks like the same symptoms as before: bogus pData pointers -- despite the init and hr check!

It shot up on aurora build 20150409004007. Regression range: https://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?fromchange=85071beda936&tochange=9dd03bf49426

Two-thirds of URLs are on YouTube. All Win7 and Win7SP1. All Intel drivers with version <= 8.10.15.2993, but nearly all of them are <= 8.10.15.2622, which makes me think bug 1151721 may be related.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: MSE
Aha: pData is pointing to nonwritable pages inside igd10umd32.dll.
Bug 1151721 seems believable, disabling hardware decoding and uploading video textures manually will change our driver usage and hit bugs that we weren't hitting before.

Even without hardware decoding blacklisted, we still might not use it in some cases (too many active DXVA decoders for one), so we probably need to handle <= 2993.

Bas, are you ok with us blacklisting d3d11 layers for these intel driver versions?
Flags: needinfo?(matt.woodrow) → needinfo?(bas)
Matt, let's get a patch and we can get it reviewed and landed; this is way too high in the crashers list to not do something about it :)
Flags: needinfo?(matt.woodrow)
[Tracking Requested - why for this release]:
This is now the top crash on 39.0b1 out side of OOM|small.
Flags: needinfo?(milan)
Tracking 39+ because regression, tracking 40+ because it could affect 40; is a top crash.
Matt, are you still looking at this crash? It still sounds like the top crash on 39 beta 3, 5% of overall crash rate for 39.
Yeah, the Intel correlation would be consistent with comment 23.

I see that nearly all the crashes have a gfxCriticalError that says "Failed to map PSConstantBuffer. Result: -214702488" over and over. In hex that's 0x8007000e which means "Not enough storage is available to complete this operation."

Virtual/physical/pagefile stats generally look fine. Could it be referring to video memory?
Are all of these crashes D3D11 + D2D combination (as in, not D3D11 + D2D 1.1 combination), right?
(In reply to Milan Sreckovic [:milan] from comment #32)
> Are all of these crashes D3D11 + D2D combination (as in, not D3D11 + D2D 1.1
> combination), right?

Right.
We tried to reproduce this on a machine in the Toronto office, and got black video and browser hangs but no crash.

It's fixed in nightly though, my best guess is bug 1153123 (though I haven't confirmed it).

I don't see any crash reports for nightly with builds since this landed, might not be enough data. We should see if this drops off in beta 4 when this was uplifted to beta.
Flags: needinfo?(matt.woodrow)
(In reply to Matt Woodrow (:mattwoodrow) from comment #34)
> I don't see any crash reports for nightly with builds since this landed,
> might not be enough data.

Nightly had a consistent trickle of single-digit crashes per day, up until the day that bug 1153123 landed, and zero crashes in the two weeks since. That's pretty good in my book!
While we're waiting for the beta numbers (bug 1153123 got uplifted to beta on Monday), Matt is going to prepare a patch to completely disable client side uploading, and that will be the big hammer ready to be applied to beta in case we don't see this bug go away there.
Flags: needinfo?(milan)
I'm adding this FlushDeletionPool signature because it has the same symptoms: 8.15.10.x drivers; D3D11+D2D; no crashes on nightly after bug 1153123. I thoroughly expect it to disappear in 39b4. In the unlikely event that it doesn't, I'll split off a new bug.
Crash Signature: [@ mozilla::layers::CompositorD3D11::UpdateConstantBuffers()] → [@ mozilla::layers::CompositorD3D11::UpdateConstantBuffers()] [@ NOutermost::CDevice::FlushDeletionPool(bool) ]
[Tracking Requested - why for this release]:

I see 2 crashes for 0b7, but that sounds encouraging!
I'll mark this fixed for 39.
40 betas are affected too.
I see very few reports for 40 going back as far as Nightly. It doesn't look like this qualifies as a topcrash on 40 and, given that the crash rate for 40 is in an acceptable range, this is now wontfix for 40.

Note that given the low rate on 40, this bug is not tracked for 41+.
This still affects some users but it's nowhere near a topcrash anymore.

Firefox 40 has 4 reports.
Firefox 41 has 17 reports.
Firefox 42 has 3 reports.
Firefox 43 has 0 reports.
Firefox 44 has 0 reports.
Keywords: topcrash
not an MSE issue
No longer blocks: MSE
Crash Signature: [@ mozilla::layers::CompositorD3D11::UpdateConstantBuffers()] [@ NOutermost::CDevice::FlushDeletionPool(bool) ] → [@ mozilla::layers::CompositorD3D11::UpdateConstantBuffers()] [@ NOutermost::CDevice::FlushDeletionPool(bool) ] [@ mozilla::layers::CompositorD3D11::UpdateConstantBuffers] [@ NOutermost::CDevice::FlushDeletionPool ]
This still affects users in current branches but at extremely low volume. I think this bug was originally filed because it was a spiking crash so I think we can close this now. I nominate that we close this bug report and file a new one if we want to deal with the outliers.
Lets call it closed.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Flags: needinfo?(bas)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.