Closed Bug 1397040 Opened 7 years ago Closed 7 years ago

Increase in Intel graphics driver crashes during 56.0b

Categories

(Core :: Graphics, defect)

56 Branch
All
Windows
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: philipp, Assigned: bas.schouten)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

crash stats shows that there are a number of signatures with intel graphics drivers in the browser process that are on the rise in firefox 56.0b again. those same signatures where already spiking back in 55.0b until we implemented a blocklisting mechanism for 6th gen intel gpus from using D3D11 keyed mutex in bug 1359416.

this new spike in 56.0b is affecting the same devices again - has something changed again in this regard since 55?

Adapter device id facet
1 	0x0102 	718 	39.02 %
2 	0x0116 	533 	28.97 %
3 	0x0106 	223 	12.12 %
4 	0x0126 	195 	10.60 %
5 	0x0166 	66 	3.59 %
6 	0x0152 	21 	1.14 %
7 	0x0122 	16 	0.87 %
8 	0x0112 	15 	0.82 %
9 	0x0046 	13 	0.71 %
10 	0x0156 	13 	0.71 %
11 	0x2a42 	10 	0.54 %

Adapter driver version facet
1 	9.17.10.4459 	1026 	54.69 %
2 	8.15.10.2418 	595 	31.72 %
3 	9.17.10.4229 	90 	4.80 %
4 	8.15.10.2353 	29 	1.55 %
5 	9.17.10.2843 	28 	1.49 %
6 	8.15.10.2900 	14 	0.75 %
7 	9.17.10.2770 	14 	0.75 %
8 	9.17.10.2828 	12 	0.64 %

crash stats search link for affected signatures:
http://bit.ly/2f0L7M6

movement of intel signatures between 55.0b99 and 56.0b:
https://mozilla.github.io/stab-crashes/scomp.html?common=product%3DFirefox%26signature%3D^igd%26date%3D%3E2017-06-01&p1=version%3D55.0b99&p2=version%3D56.0b
Flags: needinfo?(milan)
We have the same blocklisting code in 56 as it went out in 55, at least in the source code, so this is probably a different thing.
About a third of the crashes report a device reset, which really means that the stack is not that helpful - the culprit is whatever code caused the device reset, rather than where we eventually crash trying to recover from it.
More than a half seem to come from VideoData::CreateAndCopyData/FFmpegVideoDecoder, and I think there were recent changes, but I'm not sure if they were in 56.  Jean-Yves would know.
Flags: needinfo?(milan) → needinfo?(jyavenard)
Flags: needinfo?(ajones)
This is likely related to the direct upload of the YUV buffer into a D3D11 texture, which was added in bug 1223270.

May have to disable that change :(
Flags: needinfo?(jyavenard)
any ideas? seeing that you fixed bug 1359416
Flags: needinfo?(bas)
I have an idea, the vast majority of these crashes happen when the main thread is waiting for the internal device lock in https://hg.mozilla.org/releases/mozilla-beta/annotate/dcb3e24852c4/gfx/layers/d3d11/TextureD3D11.cpp#l463 and the video thread is inside UpdateSubResource. (There's a few exceptions but not many)

During our video thread uploads we already enter the ID3D10Multithread mutex. I bet we can avoid a lot of the crashes by explicitly locking that mutex while we call CreateTexture2D at the other callsite as well, I will provide a patch, that at the very least will be harmless, but will probably solve most of the issue.
Flags: needinfo?(bas)
Comment on attachment 8905232 [details]
Bug 1397040: During D3D11TextureData::Create lock the device multithread mutex explicitly when accessing the device.

https://reviewboard.mozilla.org/r/177024/#review182026

::: gfx/layers/d3d11/TextureD3D11.cpp:490
(Diff revision 1)
>      uploadDataPtr = &uploadData;
>    }
>  
> +  // See bug 1397040
> +  RefPtr<ID3D10Multithread> mt;
> +  device->QueryInterface((ID3D10Multithread**)getter_AddRefs(mt));

Would be nice if we had an RAII helper for the QI + Enter/Leave!
Attachment #8905232 - Flags: review?(matt.woodrow) → review+
Comment on attachment 8905232 [details]
Bug 1397040: During D3D11TextureData::Create lock the device multithread mutex explicitly when accessing the device.

https://reviewboard.mozilla.org/r/177024/#review182068

::: gfx/layers/d3d11/TextureD3D11.cpp:490
(Diff revision 1)
>      uploadDataPtr = &uploadData;
>    }
>  
> +  // See bug 1397040
> +  RefPtr<ID3D10Multithread> mt;
> +  device->QueryInterface((ID3D10Multithread**)getter_AddRefs(mt));

we do have D3D11MTAutoEnter.
Comment on attachment 8905232 [details]
Bug 1397040: During D3D11TextureData::Create lock the device multithread mutex explicitly when accessing the device.

https://reviewboard.mozilla.org/r/177024/#review182070

::: gfx/layers/d3d11/TextureD3D11.cpp:502
(Diff revision 1)
>    HRESULT hr = device->CreateTexture2D(&newDesc, uploadDataPtr, getter_AddRefs(texture11));
>  
>    if (FAILED(hr) || !texture11) {
>      gfxCriticalNote << "[D3D11] 2 CreateTexture2D failure Size: " << aSize
>        << "texture11: " << texture11 << " Code: " << gfx::hexa(hr);
>      return nullptr;

no mt->Leave() here?

::: gfx/layers/d3d11/TextureD3D11.cpp:526
(Diff revision 1)
>    // on it to be synchronized using it. If we did an initial upload using aSurface
>    // then bizarely this isn't covered, so we insert a manual lock/unlock pair
>    // to force this.
>    if (aSurface && newDesc.MiscFlags == D3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX) {
>      if (!LockD3DTexture(texture11.get())) {
>        return nullptr;

and there too?
Pushed by bschouten@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/52433c05f306
During D3D11TextureData::Create lock the device multithread mutex explicitly when accessing the device. r=mattwoodrow
Assignee: nobody → bas
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/52433c05f306
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Please request Beta approval on this when you get a chance.
Flags: needinfo?(ajones) → needinfo?(bas)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #11)
> Please request Beta approval on this when you get a chance.

I feel this patch is probably too risky to move to beta if we don't have beta builds left to test this with :-(
Flags: needinfo?(bas)
Comment on attachment 8905232 [details]
Bug 1397040: During D3D11TextureData::Create lock the device multithread mutex explicitly when accessing the device.

Approval Request Comment
[Feature/Bug causing the regression]: Client side video upload
[User impact if declined]: Potential crashes with video
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No 
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Slightly, if there's driver bugs in existance
[Why is the change risky/not risky?]: Relies on driver internals
[String changes made/needed]: None
Attachment #8905232 - Flags: approval-mozilla-beta?
Comment on attachment 8905232 [details]
Bug 1397040: During D3D11TextureData::Create lock the device multithread mutex explicitly when accessing the device.

A bit risky for late beta, but we still have beta 11, 12, and the RC build next week to notice and fix any fallout, and I think it is worth the risk to try and get rid of this new batch of driver crashes.
Attachment #8905232 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.