Closed
Bug 1403353
Opened 7 years ago
Closed 7 years ago
Increase in crashes with Intel's igdusc64.dll module during Firefox 57
Categories
(Core :: Graphics, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla58
People
(Reporter: philipp, Assigned: dvander)
References
()
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
Attachments
(1 file, 1 obsolete file)
3.50 KB,
patch
|
jrmuizel
:
review+
ritu
:
approval-mozilla-beta+
lizzard
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-af5e7d3d-35b2-4905-99be-920d60170926.
=============================================================
[Tracking Requested - why for this release]:
there is an increase in crashes with signatures containing intel's igdusc64.dll module during the 57.a1 cycle and continuing into 57 beta. so far those reports account for a bit over 3% of browser crashes in early data from 57.0b.
https://crash-stats.mozilla.com/search/?signature=~igdusc64.dll&product=Firefox&version=57.0b&process_type=browser&date=%3E%3D2017-06-01T&date=%3C2017-09-26T21%3A29%3A11.000Z&_sort=-date&_facets=signature&_facets=version&_facets=user_comments&_facets=adapter_vendor_id&_facets=build_id&_facets=install_time&_facets=platform_pretty_version&_facets=useragent_locale&_facets=process_type&_facets=adapter_device_id&_facets=adapter_driver_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Adapter device id facet
1 0x1616 89 61.38 %
2 0x0a16 27 18.62 %
3 0x041e 23 15.86 %
4 0x0416 3 2.07 %
Flags: needinfo?(milan)
Reporter | ||
Updated•7 years ago
|
Summary: Crash in igdusc64.dll | HeapFree | igdusc64.dll | HeapFree | igdusc64.dll | igd10iumd64.dll | stdext::_Hash<T>::insert → Increase in crashes with Intel's igdusc64.dll module during Firefox 57
David, maybe some devices aren't ready for AL?
Flags: needinfo?(milan) → needinfo?(dvander)
Priority: -- → P1
Whiteboard: [gfx-noted]
These are 64-bit, right? Is the increase matching us moving more people to 64-bit automatically?
Flags: needinfo?(madperson)
Reporter | ||
Comment 3•7 years ago
|
||
the module is 64bit only and i can't spot an 32bit counterpart of the dll in crash stats. however i don't think that the spike in 57 is directly tied to the win64 migration. in 56.0b12 we had ~60 of these crashes, now on 57.0b3 there are already more than 1000.
i also have to correct the percentage from my comment #0 - by now the issue is responsible for around 12% of browser crashes in 57.0b3.
Flags: needinfo?(madperson)
![]() |
Assignee | |
Comment 4•7 years ago
|
||
I'm worried that we're only seeing Windows 7 crashes because it doesn't support the GPU process, and we don't get GPU process crash reports on beta/release. Is there any way to see whether these crashes are also occurring in Telemetry, for Windows 10?
Flags: needinfo?(madperson)
Reporter | ||
Comment 5•7 years ago
|
||
hi marco, do you know if telemetry can offer an insight here? (i don't have access to telemetry data...)
Flags: needinfo?(madperson) → needinfo?(mcastelluccio)
![]() |
Assignee | |
Comment 6•7 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #1)
> David, maybe some devices aren't ready for AL?
Good call. The Intel driver is crashing trying to compile the "TexturedVertex" shader in Advanced Layers, which is not very common and therefore initialized lazily. (It gets used with masks and 3d transforms.) I don't know whether that's just the first bad shader to get initialized or what, but the adapters in comment #0 are ~14% of our population and kicking them back to the D3D11 compositor would be a real pain given that we want to delete it.
Milan, looking at the inventory page [1], there might be a machine or two in Toronto with 0x0416 or 0x0412 adapters. Do you know if those exist and if so, what OS they run?
[1] https://wiki.mozilla.org/QA/Platform/Graphics/Inventory
Assignee: nobody → dvander
Status: NEW → ASSIGNED
Flags: needinfo?(dvander) → needinfo?(milan)
Comment 7•7 years ago
|
||
I have a desktop machine with 0x0412 but it doesn't seem to have Windows on it. I haven't been able to find other machines that have haswell gpu's yet.
Comment 8•7 years ago
|
||
I found a hard drive in that machine that has win10 on it.
Comment 9•7 years ago
|
||
(In reply to David Anderson [:dvander] from comment #4)
> I'm worried that we're only seeing Windows 7 crashes because it doesn't
> support the GPU process, and we don't get GPU process crash reports on
> beta/release. Is there any way to see whether these crashes are also
> occurring in Telemetry, for Windows 10?
Why don't we get GPU process crash reports on beta/release?
We can't look for the specific signatures, but we can see whether there's a increase of crashes with Intel graphic cards, if that's interesting.
Flags: needinfo?(mcastelluccio)
![]() |
Assignee | |
Comment 10•7 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #9)
> (In reply to David Anderson [:dvander] from comment #4)
> > I'm worried that we're only seeing Windows 7 crashes because it doesn't
> > support the GPU process, and we don't get GPU process crash reports on
> > beta/release. Is there any way to see whether these crashes are also
> > occurring in Telemetry, for Windows 10?
>
> Why don't we get GPU process crash reports on beta/release?
>
> We can't look for the specific signatures, but we can see whether there's a
> increase of crashes with Intel graphic cards, if that's interesting.
Because there is no way to ask the user to submit reports. They have to happen to visit about:crashes. There's no UI for when the GPU process crashes (by design), but privacy limitations prevent us from submitting anything automatically.
![]() |
Assignee | |
Comment 11•7 years ago
|
||
Working theory. If you facet these crashes on "cpu info" we see:
1 family 6 model 61 stepping 4 | 4 88 61.54 %
2 family 6 model 60 stepping 3 | 4 26 18.18 %
3 family 6 model 69 stepping 1 | 4 26 18.18 %
4 family 6 model 55 stepping 8 | 2 1 0.70 %
5 family 6 model 60 stepping 3 | 8 1 0.70 %
6 family 6 model 61 stepping 4 | 3 1 0.70 %
Which is a pretty population of CPUs. Furthermore, the crashing instruction is always:
000007FEF6E0FED5 vpunpckldq xmm1,xmm1,xmm0
That's an AVX instruction. The CPUs above support AVX, but according to the Intel docs, the Operating System is responsible for enabling AVX by flipping bits 2:1 in the XCR0 control register. This is only done on Windows 7 SP1 and higher. If not, an #UD exception is raised which is an illegal exception.
If we facet on platform version:
1 6.1.7600 138 96.50 %
2 6.1.7601 Service Pack 1 4 2.80 %
3 10.0.15063 1 0.70 %
The 5 crash reports associated with Windows 7 SP1 and Windows 10 look totally unrelated.
So, my conclusion is that this is an Intel driver bug specific to x64 on Windows 7. It does not correctly check for AVX support, probably assuming that the CPUID check is enough.
Flags: needinfo?(milan)
![]() |
Assignee | |
Comment 12•7 years ago
|
||
The next question is why this spiked in 57. I don't really have any theories, it's possible Advanced Layers exacerbated it, but I also suspect this problem has existed forever and we just didn't have as many x64 users.
Crashes on the old D3D11 compositor that are clearly this bug:
https://crash-stats.mozilla.com/report/index/21b60865-347b-4b0a-9bc2-320ec0170518
https://crash-stats.mozilla.com/report/index/c65e23a1-c78a-4006-96a5-851d50170518
https://crash-stats.mozilla.com/report/index/e56b5bce-c650-40e4-b0e6-6f9950170613
etc... some of these go back to Firefox 42. Crash reports on Windows 7 SP1 look very different.
Without having STR and without knowing what causes the driver to go down this path... I'm proposing a blanket D3D11 compositor ban if the following things are all true:
1. The architecture is AMD64.
2. CPUID reports AVX support.
3. XCR0 and CR4 report that the kernel does not support AVX.
4. The adapter is an Intel device (any model/driver).
We could potentially narrow it down further, but this is a good place to start. I'll do a quick Telemetry analysis to see how many users that would affect.
Comment 13•7 years ago
|
||
That seems reasonable. We should also reach out to Intel and ask them what's going on.
![]() |
Assignee | |
Comment 14•7 years ago
|
||
This would affect, roughly guessing, an upper bound of 0.7% of Windows users:
6% of Windows users are on Windows 7 pre-SP1.
70% of those users have an Intel driver.
53% of those users are currently getting a D3D11 compositor.
44% of those users have 64-bit Windows.
74% of those users have an AVX-capable processor but no AVX OS support.
![]() |
Assignee | |
Comment 15•7 years ago
|
||
Attachment #8913028 -
Flags: review?(jmuizelaar)
![]() |
Assignee | |
Comment 16•7 years ago
|
||
Bas mentioned he would try to get in touch with Intel, setting ni? just as a reminder.
Flags: needinfo?(bas)
Updated•7 years ago
|
Attachment #8913028 -
Flags: review?(jmuizelaar) → review+
Comment hidden (mozreview-request) |
Comment 19•7 years ago
|
||
This reminds me of bug 1225094, where the bug was due to the Microsoft boot loader in dual boot scenarios (with different versions of Windows).
See Also: → 1225094
Comment 20•7 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #19)
> This reminds me of bug 1225094, where the bug was due to the Microsoft boot
> loader in dual boot scenarios (with different versions of Windows).
Note if this is the same situation, the solution from comment 12 will not fix the problem.
I guess we will know when we land this.
Comment 21•7 years ago
|
||
Comment on attachment 8913213 [details]
Bug 1403353: Enable OMTP by default on windows only.
I put this on the wrong bug.
Attachment #8913213 -
Attachment is obsolete: true
Attachment #8913213 -
Flags: review?(dvander)
Comment 22•7 years ago
|
||
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/79580c3ab338
Block D3D11 when using Intel drivers on Windows 7 systems with partial AVX support. (bug 1403353, r=jrmuizel)
Updated•7 years ago
|
Blocks: win64-migration
status-firefox55:
--- → affected
status-firefox56:
--- → affected
status-firefox-esr52:
--- → affected
OS: Windows → Windows 7
Comment 23•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment 24•7 years ago
|
||
We plan to automatically migrate eligible Windows users running 32-bit Firefox to 64-bit Firefox 56. To avoid causing Haswell users to crash by migrating them to 64-bit, we would like to uplift this fix to a 56.0.1 dot release after the fix has baked on Nightly for a few days.
If we can't uplift this fix to 56.0.1, then we can exclude Windows 7 users without SP1 from 64-bit migration to avoid this crash. We can migrate those users later when this fix rides to the Release channel (in Firefox 57 or 58).
Comment 25•7 years ago
|
||
[Tracking Requested - why for this release]: See comment 24.
tracking-firefox56:
--- → ?
Reporter | ||
Comment 26•7 years ago
|
||
could you please request uplift for beta if you deem fit to do so? the last recorded crash on nightly was on build 20170927100120...
Flags: needinfo?(dvander)
![]() |
Assignee | |
Comment 27•7 years ago
|
||
Comment on attachment 8913028 [details] [diff] [review]
patch
Approval Request Comment
[Feature/Bug causing the regression]: Multiple factors; D3D11 changes, x64 migration
[User impact if declined]: Sporadic crashes on older versions of Windows 7
[Is this code covered by automated tests?]: N/A
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: It's just a blocklist entry for broken drivers, and we've made it surgical to affect as few users as possible.
[String changes made/needed]:
Flags: needinfo?(dvander)
Attachment #8913028 -
Flags: approval-mozilla-beta?
Updated•7 years ago
|
Comment on attachment 8913028 [details] [diff] [review]
patch
Approval Request Comment
We're currently holding back 64-bit auto-upgrades away from Win 7 Pre-SP1 users because of this problem. We got a request to make this available for 56.0.1 so that we can upgrade everybody.
The patch has stopped crashes on nightly since landed and looks stable.
Attachment #8913028 -
Flags: approval-mozilla-release?
Comment on attachment 8913028 [details] [diff] [review]
patch
promising data from nightly that this fix works as expected, beta57+
Attachment #8913028 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 30•7 years ago
|
||
bugherder uplift |
Comment 31•7 years ago
|
||
So far there is only 1 crash in the last week for 56 across all these signatures. That seems unusual. Are you sure we need this on 56 release?
Flags: needinfo?(dvander)
Comment 32•7 years ago
|
||
Searching again more widely. I'm seeing 0 crashes on 56 release so far, https://crash-stats.mozilla.com/search/?signature=~igdusc64.dll&product=Firefox&version=56.0&process_type=browser&date=%3E%3D2017-06-01T00%3A00%3A00.000Z&date=%3C2017-09-26T21%3A29%3A11.000Z&page=1&_sort=version&_sort=-date&_facets=signature&_facets=version&_facets=user_comments&_facets=adapter_vendor_id&_facets=build_id&_facets=install_time&_facets=platform_pretty_version&_facets=useragent_locale&_facets=process_type&_facets=adapter_device_id&_facets=adapter_driver_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
![]() |
Assignee | |
Comment 33•7 years ago
|
||
That is unusual. There are crashes for, say, 56 beta but not release. Anyway I'm not sure we need it there, if it's not a problem then no need to uplift.
Flags: needinfo?(dvander)
Comment 34•7 years ago
|
||
Chris pointed out I was missing the gpu and content crashes, so, there are 14 on release 56:
https://crash-stats.mozilla.com/search/?signature=~igdusc64.dll&product=Firefox&version=56.0&platform=Windows&date=%3E%3D2017-04-02T20%3A54%3A22.000Z&date=%3C2017-10-02T20%3A54%3A22.000Z&_sort=-date&_facets=signature&_facets=cpu_arch&_facets=release_channel&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
Comment 35•7 years ago
|
||
To get a sense of relative scale of how many Release channel users might affected by this Haswell crash if we migrate 100% of eligible users, I searched for the number of crash reports (any crash signature from the last six months) from eligible Beta and Release users, i.e. currently running 32-bit Firefox on Windows 7 pre-SP1 with more than 2GB RAM.
Over the last six months, there were 369,180 Beta and 388,450 Release crash reports from these particular users. That suggests that we have roughly the same number of Beta and Release users currently running 32-bit Firefox on Windows 7 pre-SP1 with more than 2GB RAM. So we should expect roughly the same number of Haswell crashes from migrated Release users as we saw from migrated Beta users. We saw 7,168 crashes from Beta 56 and 57 over the last seven days, so we will probably see just 72 crashes in the first week after migrating 1% of eligible Release users (if we don't uplift this fix or exclude Win7 pre-SP1 from migration).
Comment 36•7 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #35)
> searched for the number of crash reports (any crash signature from the last
> six months) from eligible Beta and Release users, i.e. currently running
> 32-bit Firefox on Windows 7 pre-SP1 with more than 2GB RAM.
Here is my Win7 pre-SP1 crash query:
https://crash-stats.mozilla.com/search/?platform_version=%3D6.1.7600&cpu_arch=%21amd64&release_channel=beta&total_physical_memory=%3E2147483648&product=Firefox&platform=Windows&date=%3E%3D2017-04-02T21%3A08%3A03.000Z&date=%3C2017-10-02T21%3A08%3A03.000Z&_sort=-date&_facets=signature&_facets=cpu_arch&_facets=platform_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-cpu_arch
Comment 37•7 years ago
|
||
OK. So it's not a ton of users, but on the other hand, this won't affect future dot releases (it would be incorporated into them) and it also won't affect our watershed. 56.0 will remain the watershed for the migration. So there isn't any big reason *not* to uplift. If we decide to increase the % of users who migrate during the 56 cycle, then it will be totally worth it.
Comment 38•7 years ago
|
||
Comment on attachment 8913028 [details] [diff] [review]
patch
Taking this for the planned dot release for 64-bit migration rollout.
Attachment #8913028 -
Flags: approval-mozilla-release? → approval-mozilla-release+
Comment 39•7 years ago
|
||
bugherder uplift |
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•