crash in igd10umd32.dll@0x18f35 coming from mozilla::layers::DataTextureSourceD3D11::Update

RESOLVED FIXED in Firefox 43

Status

()

defect
--
critical
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: kairo, Assigned: mattwoodrow)

Tracking

({crash})

Trunk
mozilla43
x86
Windows NT
Points:
---

Firefox Tracking Flags

(firefox41 unaffected, firefox42 unaffected, firefox43 fixed)

Details

(crash signature)

Attachments

(1 attachment)

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-2f88ebca-3c70-42b9-bc25-2d6502150908.
=============================================================

This may be related to bug 1098597, which shares the signature but seems to have somewhat different stacks - but even those go through mozilla::layers::DataTextureSourceD3D11::Update.

Stack Trace:
0 	igd10umd32.dll 	igd10umd32.dll@0x18f35 	
Ø 1 	igd10umd32.dll 	igd10umd32.dll@0x7a07 	
Ø 2 	igd10umd32.dll 	igd10umd32.dll@0x7024 	
Ø 3 	igd10umd32.dll 	igd10umd32.dll@0x335f 	
4 	xul.dll 	mozilla::layers::DataTextureSourceD3D11::Update(mozilla::gfx::DataSourceSurface*, nsIntRegion*, mozilla::gfx::IntPointTyped<mozilla::gfx::UnknownUnits>*) 	gfx/layers/d3d11/TextureD3D11.cpp
5 	xul.dll 	mozilla::layers::BufferTextureHost::Upload(nsIntRegion*) 	gfx/layers/composite/TextureHost.cpp
6 	xul.dll 	mozilla::layers::BufferTextureHost::MaybeUpload(nsIntRegion*) 	gfx/layers/composite/TextureHost.cpp
7 	xul.dll 	mozilla::layers::BufferTextureHost::UpdatedInternal(nsIntRegion const*) 	gfx/layers/composite/TextureHost.cpp
8 	xul.dll 	mozilla::layers::TextureHost::Updated(nsIntRegion const*) 	gfx/layers/composite/TextureHost.cpp
9 	xul.dll 	mozilla::layers::ContentHostSingleBuffered::UpdateThebes(mozilla::layers::ThebesBufferData const&, nsIntRegion const&, nsIntRegion const&, nsIntRegion*) 	gfx/layers/composite/ContentHost.cpp
10 	xul.dll 	mozilla::layers::CompositableParentManager::ReceiveCompositableUpdate(mozilla::layers::CompositableOperation const&, std::vector<mozilla::layers::EditReply, std::allocator<mozilla::layers::EditReply> >&) 	gfx/layers/ipc/CompositableTransactionParent.cpp
11 	xul.dll 	mozilla::layers::LayerTransactionParent::RecvUpdate(nsTArray<mozilla::layers::Edit>&&, unsigned __int64 const&, mozilla::layers::TargetConfig const&, nsTArray<mozilla::layers::PluginWindowData>&&, bool const&, bool const&, unsigned int const&, bool const&, mozilla::TimeStamp const&, nsTArray<mozilla::layers::EditReply>*) 	gfx/layers/ipc/LayerTransactionParent.cpp

Those are spiking in Firefox 41.0b7 over last weekend, this is now 1.7% of all b7 crashes (rank #5), but the signature exists in other versions, mostly looking like bug 1098597. This 40.0.3 crash looks like more like the stack in here as well though: bp-3ae588ee-c09d-4669-b5b8-21fcd2150908
on beta 41.0b7 this seems to be contained to adapters from the intel gma 4500 series:
1 0x2a42 	668 	92.39 %
2 0x2e12 	47 	6.50 %
3 0x2e22 	8 	1.11 %
Given this signature spans multiple stacks and the bug here is only tracking a spike in 41.0b7 for the specific stack, I'm not sure how to break this down further; at least not without doing a manual report-by-report correlation.

That said, here is the 41.0b7 breakdown for the signature by driver versions:
1 	8.15.10.1892 	313 	42.59 %
2 	8.15.10.1883 	181 	24.63 %
3 	8.15.10.1855 	104 	14.15 %
4 	8.15.10.1872 	73 	9.93 %
5 	8.15.10.1994 	62 	8.44 %
6 	8.15.10.1851 	2 	0.27 %

The latest driver is 8.15.10.2869 from November 6, 2013 so maybe we can look at blocklisting as a workaround.
The spike is on Beta 7 only?
(In reply to Milan Sreckovic [:milan] from comment #3)
> The spike is on Beta 7 only?

That's what KaiRo said in the Release Coordination meeting today. I'll let him elaborate.
We blocklist D2D on all driver versions for these cards, and all functionality on driver version 8.15.10.2342.  Perhaps we should blocklist all versions lower than 8.15.10.2342 as well.  It would be good to understand what caused the increase, though I didn't see anything obvious in the beta 6 -> beta 7 patches.

Jeff, what do you think about blocklisting all the drivers below the one above?

Anthony, do we have the telemetry for which other GMAX4500 drivers we're seeing, that are not in the list above?  It'd be good to understand if we'd be blocklisting versions that didn't have problems.
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(anthony.s.hughes)
(In reply to Milan Sreckovic [:milan] from comment #5)
> Anthony, do we have the telemetry for which other GMAX4500 drivers we're
> seeing, that are not in the list above?  It'd be good to understand if we'd
> be blocklisting versions that didn't have problems.

As far as I know the list of driver versions I provided in comment 2 is a complete list.
Flags: needinfo?(anthony.s.hughes)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #4)
> (In reply to Milan Sreckovic [:milan] from comment #3)
> > The spike is on Beta 7 only?
> 
> That's what KaiRo said in the Release Coordination meeting today. I'll let
> him elaborate.

Yes, this spike is b7 only so far (we don't have usable data for b8 yet as we just shipped that today).
I suspect this may ultimately be an OOM issue. A large number of these reports have dangerously-low virtual and/or physical memory availability.

FWIW three out of nine comments from b7 mention playing video.
Wonder if the patch for bug 1193547 could be involved; it landed between beta 6 and beta 7 from what I can tell.
Flags: needinfo?(matt.woodrow)
I think that is the problem, we should back that out of aurora/beta (for bug 1202296 as well).

I'm pretty sure these devices are ones that fail the DoesD3D11TextureSharingWork() test, which is why we're taking this specific path.

When playing HD video without the patch, the decoder would upload to D3D9 textures internally (d3d9 textures don't seem to ever have issues with sharing, unlike d3d11 ones). With the patch the decoder will output system memory, we'll see that texture sharing doesn't work, and copy it into shmem and do the upload on the compositor.

I guess this could use more memory (since we have the shmem copy, as well as the GPU copy), but it's also possible that this is just a spike in this particular allocation stack rather than a true crash spike. Or a combination of both.

Let's back out for now. I think we can upload to d3d9 on the client side to closer match the previous behaviour.
Flags: needinfo?(matt.woodrow)
Tracked as this crash is in the top 10 for FF41.
FWIW, the spike persists in b8, so that's more confirmation that a change between b6 and b7 triggered the issue of this signature rising significantly (and thanks for digging and finding a possible culprit).
I backed bug 1193547 out of aurora/beta, so this should only affect nightly now.
This more closely matches what the MFTransform would do, and uploads to d3d9 on the client side.

This should stop us needing to keep shmem around and keep memory usage a bit lower.
Assignee: nobody → matt.woodrow
Attachment #8658941 - Flags: review?(bas)
Comment on attachment 8658941 [details] [diff] [review]
Upload to d3d9 textures

Review of attachment 8658941 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/IMFYCbCrImage.cpp
@@ +231,1 @@
>        return GetD3D9TextureClient(aClient);

Probably worth a comment to note this will return null in case there is no D3D9 device.
Attachment #8658941 - Flags: review?(bas) → review+
https://hg.mozilla.org/mozilla-central/rev/c05f9ff38eaa
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Flags: needinfo?(jmuizelaar)
You need to log in before you can comment on or make changes to this bug.