Closed
Bug 1225094
Opened 9 years ago
Closed 8 years ago
Crash in [@ intel_aes_gcmINIT ]
Categories
(NSS :: Libraries, defect)
Tracking
(firefox45 wontfix, firefox47 wontfix, firefox48 wontfix, firefox49 wontfix, firefox-esr45 wontfix, firefox50 wontfix, firefox52 wontfix)
RESOLVED
WONTFIX
3.25
People
(Reporter: btot, Assigned: n.nethercote)
References
Details
(Keywords: crash)
Crash Data
Attachments
(2 files)
2.07 KB,
patch
|
mt
:
review+
mt
:
checked-in+
|
Details | Diff | Splinter Review |
785 bytes,
patch
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20151115030440
Graphics: NVIDIA GeForce 210
Steps to reproduce:
Open a new tab via Ctrl+T or the "new tab" button on the toolbar.
Here are the crash reports, for about 30 minutes:
bp-7ddb5190-462f-4952-bfc6-22ea32151116
dd597c47-9a52-4c8d-bada-8c3953cbe831
e5388451-0c38-4497-bad5-0d3d232389aa
bp-440e611a-9b84-44f4-af8c-be8f22151116
Actual results:
The browser crashes, intermittent
Expected results:
Browser should not crash.
Comment 1•9 years ago
|
||
Took a look at this and with mconley's suggestion, adding ttaubert to the bug, who is potentially doubly familiar to this :-) With newtab and with NSS!
This looks like an NSS-related issue that happens fairly infrequently.
However, it might still be worth a look.
Possibilities:
On newtab pageload, we either make a request over HTTPS to:
* the tiles servers
* a page url to capture a thumbnail
Comment 2•9 years ago
|
||
Hmm, the crash is in the ASM code for intel_aes_gcmINIT:
http://hg.mozilla.org/mozilla-central/annotate/51fa3e0d4f7b/security/nss/lib/freebl/intel-gcm-x86-masm.asm#l132
Is there any good way to reproduce it? You'd surely need an Intel CPU with AES instructions but other than that? Looks like it's Windows-only as well:
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=intel_aes_gcmINIT
Comment 4•9 years ago
|
||
The crash reason is EXCEPTION_ILLEGAL_INSTRUCTION, so is it possible that the CPU doesn't support |vzeroupper|? Looks like the CPU needs Advanced Vector Extensions [1].
"AuthenticAMD family 21 model 2 stepping 0" (as taken from the first crash report) is codename "Zambezi", and from the AMD Bulldozer Family. According to [2] the Bulldozer CPUs should support those extensions.
Maybe I'm on the wrong path here, maybe not. I'm not sure who the best person to investigate this is?
[1] https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
[2] https://en.wikipedia.org/wiki/Bulldozer_%28microarchitecture%29
![]() |
||
Updated•9 years ago
|
Assignee: nobody → nobody
Component: New Tab Page → Libraries
Product: Firefox → NSS
Summary: Firefox crashes when opening a new tab → Firefox 45.0a1 Crash [@ intel_aes_gcmINIT ]
Version: 45 Branch → 3.21
![]() |
||
Updated•9 years ago
|
Comment 5•9 years ago
|
||
Tim: I think you are on the right track. We need to review the
code that sets has_intel_avx to 1:
http://mxr.mozilla.org/nss/ident?i=has_intel_avx
But according to the Wikipedia articles you cited, the AMD Bulldozer
CPU should have AVX. So I don't know why we get the illegal instruction
exception there.
Comment 6•9 years ago
|
||
Brindusa, can you tell use what machine you run Firefox on? The manufacturer and maybe the exact model?
Flags: needinfo?(brindusa.tot)
Reporter | ||
Comment 7•9 years ago
|
||
Sorry for late response.
I am using an AMD FX - 8320 Eight-Core processor, 8GB Ram and Graphics: NVIDIA GeForce 210.
Flags: needinfo?(brindusa.tot)
Comment 8•9 years ago
|
||
If you are still here, can you post the output of: https://technet.microsoft.com/en-ca/sysinternals/cc835722.aspx
![]() |
Assignee | |
Comment 9•9 years ago
|
||
I filed bug 1263495 for this same issue. The reason I didn't see this bug was that the crash signature field was filled out incorrectly, which meant this bug didn't show up on crash-stats.mozilla.org. The field should read "[@ intel_aes_gcmINIT ]". We should dup one of these bugs to the other one.
Crash Signature: [intel_aes_gcmINIT] → [@ intel_aes_gcmINIT ]
![]() |
Assignee | |
Comment 11•9 years ago
|
||
I haven't tested this code, because I'm not sure how to test NSS changes across
multiple platforms. I also don't know if modifying freebl is any different to
modifying other parts of NSS.
mt, are you able to help test this?
Attachment #8753148 -
Flags: review?(martin.thomson)
![]() |
Assignee | |
Updated•9 years ago
|
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
Comment 12•9 years ago
|
||
While at it, I would suggest in freebl_cpuid to get rid of the pushad/popad and change wcpuid to cpuid (the instruction was added to MSVC long long ago I think).
Comment 13•9 years ago
|
||
BTW, newer version of MSVC also automatically zero ecx automatically in the __cpuid intrinsic.
![]() |
Assignee | |
Comment 14•9 years ago
|
||
I think I will stick with the smallest possible changes to implement this fix. This code is way outside of my zone of expertise :)
Comment 15•9 years ago
|
||
Comment on attachment 8753148 [details] [diff] [review]
Zero %ecx before calling CPUID
Review of attachment 8753148 [details] [diff] [review]:
-----------------------------------------------------------------
I see no reason not to land this. Did you check that we don't need the code in lib/freebl/mpi/mpcpucache_x86.s ? xorl %ecx, %ecx seems to be necessary there as well, but it looks like unused code.
https://hg.mozilla.org/projects/nss/rev/a9af87b6c9a7
Attachment #8753148 -
Flags: review?(martin.thomson)
Attachment #8753148 -
Flags: review+
Attachment #8753148 -
Flags: checked-in+
Comment 16•9 years ago
|
||
As for testing. I don't have the resources :) I can only verify that it works on our build cluster, for which I have to check it in. I'll keep an eye on it.
![]() |
Assignee | |
Comment 17•9 years ago
|
||
Might as well do this one too, just in case.
Attachment #8753604 -
Flags: review?(martin.thomson)
![]() |
Assignee | |
Comment 18•9 years ago
|
||
mt, is there precedent for cherry-picking NSS changes into Firefox? It'd be nice to see if the patches in bug 1225094 fix crashes without having to wait for an NSS uplift. And if they do fix the crashes, backporting them to Aurora/Beta would be worthwhile.
Comment 19•9 years ago
|
||
It's not recommended. The preferred way is to just pick up the latest version of NSS in those branches, as cherry picking puts downstream users who package NSS separately in a bind.
It can happen if the fix is important enough and the changes to NSS branch are risky.
Comment 20•9 years ago
|
||
Comment on attachment 8753604 [details] [diff] [review]
Zero %ecx before calling CPUID in mpcpucache_x86.s
Review of attachment 8753604 [details] [diff] [review]:
-----------------------------------------------------------------
I don't know this stuff, but should this be xorl rather than xor?
![]() |
Assignee | |
Comment 21•9 years ago
|
||
> I don't know this stuff, but should this be xorl rather than xor?
Yes, it should be. Are you able to make the change before landing? Thank you.
Comment 22•9 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.25
Comment 23•9 years ago
|
||
I am not sure this will actually fix the problem.
Comment 24•9 years ago
|
||
Nick, this will be in NSS 3.25, I can backport these patches to NSS 3.24 but we would have to discuss whether we wanted a point release. And then we have to convince the release managers that they want to take the release. I will leave that part to you.
Which release would you like this to be in? (I too would like to see if this makes a difference.)
Flags: needinfo?(n.nethercote)
Updated•9 years ago
|
Attachment #8753604 -
Flags: review?(martin.thomson)
Comment 25•9 years ago
|
||
Comment on attachment 8753148 [details] [diff] [review]
Zero %ecx before calling CPUID
Review of attachment 8753148 [details] [diff] [review]:
-----------------------------------------------------------------
::: security/nss/lib/freebl/mpi/mpcpucache.c
@@ +73,5 @@
> void freebl_cpuid(unsigned long op, unsigned long *eax,
> unsigned long *ebx, unsigned long *ecx,
> unsigned long *edx)
> {
> +/* Some older processors don't fill the ecx register with cpuid, so clobber it
I'm very curious about this bug. How did you know about
this problem? I can't find a webpage that describes this
problem by web searches.
I guess we don't need to zero edx because the feature
flags returned in edx are older features?
![]() |
Assignee | |
Comment 26•9 years ago
|
||
> I'm very curious about this bug. How did you know about
> this problem? I can't find a webpage that describes this
> problem by web searches.
The first I heard about it was from bug 1096651. I don't know how/where glandium learned about it.
Flags: needinfo?(n.nethercote)
![]() |
Assignee | |
Comment 27•9 years ago
|
||
I'm going to leave this open until we have a clear picture of whether clearing %ecx fixes the problem.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
![]() |
Assignee | |
Comment 28•9 years ago
|
||
(In reply to Martin Thomson [:mt:] from comment #24)
> Nick, this will be in NSS 3.25, I can backport these patches to NSS 3.24 but
> we would have to discuss whether we wanted a point release. And then we
> have to convince the release managers that they want to take the release. I
> will leave that part to you.
So...
- This crash doesn't make the top 300 crashes for Firefox 46.0 in the past 7 days.
- However, it is #19 for Firefox 49.0 in the past 7 days. I'm not sure why it's ranked so much higher in Nightly than Release.
- It also shows up for Thunderbird users quite a bit.
- I hope the two patches will fix the crash, but it's not certain.
I don't have a good sense of whether these numbers would typically warrant an NSS point release and update. Ritu, perhaps you do? Thank you.
Flags: needinfo?(rkothari)
Comment 29•9 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #26)
>
> The first I heard about it was from bug 1096651. I don't know how/where
> glandium learned about it.
Hi Nicholas: thanks a lot for the info. According to bug 1096651 comment 0,
the old CPU in question was manufactured by Cyrix and has a clock speed of
225 MHz. This patch has essentially no risk, so it is worth a try, but I
doubt many Firefox 49.0 users use such an old and slow CPU. So, either other
more modern CPUs also have this bug, or this bug is caused by something else.
(In reply to Nicholas Nethercote [:njn] from comment #28)
> (In reply to Martin Thomson [:mt:] from comment #24)
>
> I don't have a good sense of whether these numbers would typically warrant
> an NSS point release and update. Ritu, perhaps you do? Thank you.
Hi Nicholas, the current NSS - release mappings from https://kuix.de/mozilla/versions/ indicates that Fx47 is at NSS 3.23 RTM. Looking at the # of occurrences of this on Fx47 (< 25 total in a week), it does not warrant another NSS update especially since we are 2 weeks away from going live.
For Fx46, the total # is also less than 125 per week, so definitely not a dot release driver by itself. Lizzard fyi.
Flags: needinfo?(rkothari) → needinfo?(lhenry)
Comment 31•9 years ago
|
||
We can keep an eye on the crash in Nightly. Currently, there were 19 crashes in the last week with this signature, on 9 installs, all on XP. That doesn't worry me too much.
Flags: needinfo?(lhenry)
![]() |
Assignee | |
Comment 32•9 years ago
|
||
> For Fx46, the total # is also less than 125 per week, so definitely not a
> dot release driver by itself.
Thank you for confirming.
Comment 33•9 years ago
|
||
Anyway, I just asked Brindusa Tot to run Sysinternals CoreInfo and that shows OSXSAVE in CPUID is enabled.
Reporter | ||
Comment 34•9 years ago
|
||
Below is the output after running CoreInfo:
Coreinfo v3.31 - Dump information on system CPU and memory topology Copyright (C) 2008-2014 Mark Russinovich Sysinternals - www.sysinternals.com
AMD FX(tm)-8320 Eight-Core Processor
x86 Family 21 Model 2 Stepping 0, AuthenticAMD
HTT * Multicore
HYPERVISOR - Hypervisor is present
VMX - Supports Intel hardware-assisted virtualization
SVM * Supports AMD hardware-assisted virtualization
X64 * Supports 64-bit mode
SMX - Supports Intel trusted execution
SKINIT * Supports AMD SKINIT
NX * Supports no-execute page protection
SMEP - Supports Supervisor Mode Execution Prevention
SMAP - Supports Supervisor Mode Access Prevention
PAGE1GB * Supports 1 GB large pages
PAE * Supports > 32-bit physical addresses
PAT * Supports Page Attribute Table
PSE * Supports 4 MB pages
PSE36 * Supports > 32-bit address 4 MB pages
PGE * Supports global bit in page tables
SS - Supports bus snooping for cache operations
VME * Supports Virtual-8086 mode
RDWRFSGSBASE - Supports direct GS/FS base access
FPU * Implements i387 floating point instructions
MMX * Supports MMX instruction set
MMXEXT * Implements AMD MMX extensions
3DNOW - Supports 3DNow! instructions
3DNOWEXT - Supports 3DNow! extension instructions
SSE * Supports Streaming SIMD Extensions
SSE2 * Supports Streaming SIMD Extensions 2
SSE3 * Supports Streaming SIMD Extensions 3
SSSE3 * Supports Supplemental SIMD Extensions 3
SSE4a * Supports Streaming SIMDR Extensions 4a
SSE4.1 * Supports Streaming SIMD Extensions 4.1
SSE4.2 * Supports Streaming SIMD Extensions 4.2
AES * Supports AES extensions
AVX * Supports AVX intruction extensions
FMA * Supports FMA extensions using YMM state
MSR * Implements RDMSR/WRMSR instructions
MTRR * Supports Memory Type Range Registers
XSAVE * Supports XSAVE/XRSTOR instructions
OSXSAVE * Supports XSETBV/XGETBV instructions
RDRAND - Supports RDRAND instruction
RDSEED - Supports RDSEED instruction
CMOV * Supports CMOVcc instruction
CLFSH * Supports CLFLUSH instruction
CX8 * Supports compare and exchange 8-byte instructions
CX16 * Supports CMPXCHG16B instruction
BMI1 * Supports bit manipulation extensions 1
BMI2 - Supports bit manipulation extensions 2
ADX - Supports ADCX/ADOX instructions
DCA - Supports prefetch from memory-mapped device
F16C * Supports half-precision instruction
FXSR * Supports FXSAVE/FXSTOR instructions
FFXSR * Supports optimized FXSAVE/FSRSTOR instruction
MONITOR * Supports MONITOR and MWAIT instructions
MOVBE - Supports MOVBE instruction
ERMSB - Supports Enhanced REP MOVSB/STOSB
PCLMULDQ * Supports PCLMULDQ instruction
POPCNT * Supports POPCNT instruction
LZCNT * Supports LZCNT instruction
SEP * Supports fast system call instructions
LAHF-SAHF * Supports LAHF/SAHF instructions in 64-bit mode
HLE - Supports Hardware Lock Elision instructions
RTM - Supports Restricted Transactional Memory instruction
DE * Supports I/O breakpoints including CR4.DE
DTES64 - Can write history of 64-bit branch addresses
DS - Implements memory-resident debug buffer
DS-CPL - Supports Debug Store feature with CPL
PCID - Supports PCIDs and settable CR4.PCIDE
INVPCID - Supports INVPCID instruction
PDCM - Supports Performance Capabilities MSR
RDTSCP * Supports RDTSCP instruction
TSC * Supports RDTSC instruction
TSC-DEADLINE - Local APIC supports one-shot deadline timer
TSC-INVARIANT * TSC runs at constant rate
xTPR - Supports disabling task priority messages
EIST - Supports Enhanced Intel Speedstep
ACPI - Implements MSR for power management
TM - Implements thermal monitor circuitry
TM2 - Implements Thermal Monitor 2 control
APIC * Implements software-accessible local APIC
x2APIC - Supports x2APIC
CNXT-ID - L1 data cache mode adaptive or BIOS
MCE * Supports Machine Check, INT18 and CR4.MCE
MCA * Implements Machine Check Architecture
PBE - Supports use of FERR#/PBE# pin
PSN - Implements 96-bit processor serial number
PREFETCHW * Supports PREFETCHW instruction
Maximum implemented CPUID leaves: 0000000D (Basic), 8000001E (Extended).
Logical to Physical Processor Map:
*------- Physical Processor 0
-*------ Physical Processor 1
--*----- Physical Processor 2
---*---- Physical Processor 3
----*--- Physical Processor 4
-----*-- Physical Processor 5
------*- Physical Processor 6
-------* Physical Processor 7
Logical Processor to Socket Map:
******** Socket 0
Logical Processor to NUMA Node Map:
******** NUMA Node 0
Logical Processor to Cache Map:
*------- Data Cache 0, Level 1, 16 KB, Assoc 4, LineSize 64
*------- Instruction Cache 0, Level 1, 64 KB, Assoc 2, LineSize 64
*------- Unified Cache 0, Level 2, 2 MB, Assoc 16, LineSize 64
-*------ Data Cache 1, Level 1, 16 KB, Assoc 4, LineSize 64
-*------ Instruction Cache 1, Level 1, 64 KB, Assoc 2, LineSize 64
-*------ Unified Cache 1, Level 2, 2 MB, Assoc 16, LineSize 64
--*----- Data Cache 2, Level 1, 16 KB, Assoc 4, LineSize 64
--*----- Instruction Cache 2, Level 1, 64 KB, Assoc 2, LineSize 64
--*----- Unified Cache 2, Level 2, 2 MB, Assoc 16, LineSize 64
---*---- Data Cache 3, Level 1, 16 KB, Assoc 4, LineSize 64
---*---- Instruction Cache 3, Level 1, 64 KB, Assoc 2, LineSize 64
---*---- Unified Cache 3, Level 2, 2 MB, Assoc 16, LineSize 64
----*--- Data Cache 4, Level 1, 16 KB, Assoc 4, LineSize 64
----*--- Instruction Cache 4, Level 1, 64 KB, Assoc 2, LineSize 64
----*--- Unified Cache 4, Level 2, 2 MB, Assoc 16, LineSize 64
-----*-- Data Cache 5, Level 1, 16 KB, Assoc 4, LineSize 64
-----*-- Instruction Cache 5, Level 1, 64 KB, Assoc 2, LineSize 64
-----*-- Unified Cache 5, Level 2, 2 MB, Assoc 16, LineSize 64
------*- Data Cache 6, Level 1, 16 KB, Assoc 4, LineSize 64
------*- Instruction Cache 6, Level 1, 64 KB, Assoc 2, LineSize 64
------*- Unified Cache 6, Level 2, 2 MB, Assoc 16, LineSize 64
-------* Data Cache 7, Level 1, 16 KB, Assoc 4, LineSize 64
-------* Instruction Cache 7, Level 1, 64 KB, Assoc 2, LineSize 64
-------* Unified Cache 7, Level 2, 2 MB, Assoc 16, LineSize 64
Comment 35•9 years ago
|
||
Just filed bug 1283585 with the new AES-GCM code.
Comment 36•9 years ago
|
||
I also ran into this crash today several times on Windows XP 32-bit while testing Firefox 48beta7. Don't have any steps to reproduce unfortunately, it just happened all of a sudden. I also modified the title since this is not specific to 45 branch.
Setup:
Operating System: Windows XP Professional (5.1, Build 2600) Service Pack 3 (2600.xpsp_sp3_qfe.130704-0421)
Language: English (Regional Setting: English)
System Manufacturer: Gigabyte Technology Co., Ltd.
System Model: GA-78LMT-USB3
BIOS: Award Modular BIOS v6.00PG
Processor: AMD FX(tm)-8320 Eight-Core Processor, MMX (8 CPUs), ~3.5GHz
Memory: 3326MB RAM
Page File: 3523MB used, 1681MB available
Windows Dir: C:\WINDOWS
DirectX Version: DirectX 9.0c (4.09.0000.0904)
DX Setup Parameters: Not found
DxDiag Version: 5.03.2600.5512 32bit Unicode
Card name: AMD Radeon HD 6450
Crash reports:
bp-2452f2fc-47ac-4979-9137-8c12e2160712
bp-0274bd69-36c5-419d-8595-b11962160712
bp-0ba074fb-8258-4c12-b309-b440c2160712
bp-132956e3-57fa-444e-88b7-edf802160712
status-firefox48:
--- → affected
Summary: Firefox 45.0a1 Crash [@ intel_aes_gcmINIT ] → Crash in [@ intel_aes_gcmINIT ]
Comment 37•9 years ago
|
||
Crash volume for signature 'intel_aes_gcmINIT':
- nightly (version 50): 29 crashes from 2016-06-06.
- aurora (version 49): 0 crash from 2016-06-07.
- beta (version 48): 127 crashes from 2016-06-06.
- release (version 47): 691 crashes from 2016-05-31.
- esr (version 45): 25 crashes from 2016-04-07.
Crash volume on the last weeks:
Week N-1 Week N-2 Week N-3 Week N-4 Week N-5 Week N-6 Week N-7
- nightly 2 3 0 0 8 3 5
- aurora 0 0 0 0 0 0 0
- beta 10 19 25 29 23 13 7
- release 119 100 105 114 85 81 46
- esr 0 1 0 15 4 4 0
Affected platform: Windows
status-firefox47:
--- → affected
status-firefox50:
--- → affected
status-firefox-esr45:
--- → affected
Comment 38•8 years ago
|
||
Crash volume for signature 'intel_aes_gcmINIT':
- nightly (version 51): 0 crashes from 2016-08-01.
- aurora (version 50): 0 crashes from 2016-08-01.
- beta (version 49): 59 crashes from 2016-08-02.
- release (version 48): 138 crashes from 2016-07-25.
- esr (version 45): 31 crashes from 2016-05-02.
Crash volume on the last weeks (Week N is from 08-22 to 08-28):
W. N-1 W. N-2 W. N-3
- nightly 0 0 0
- aurora 0 0 0
- beta 19 3 4
- release 52 41 18
- esr 3 0 1
Affected platform: Windows
Crash rank on the last 7 days:
Browser Content Plugin
- nightly
- aurora
- beta #406
- release #532
- esr #2273
status-firefox49:
--- → affected
![]() |
Assignee | |
Comment 39•8 years ago
|
||
An update: these patches went in with NSS 3.25, which went into Firefox 49 (https://hg.mozilla.org/integration/mozilla-inbound/rev/a2f23b6058a2, bug 1277255). Looking at crashes from the past 7 days, I see 7 in 49.0 and another 18 or so in 49 betas. This suggests that the patches didn't fix the problem :(
Comment 40•8 years ago
|
||
This used to be more frequent with AMD CPUs, but it's the other way around now.
Intel 125/180
AMD 56/180
Comment 41•8 years ago
|
||
Has anybody checked in a raw dump if the instruction that we're trying to execute
is actually vzeroupper?
Comment 42•8 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #41)
> Has anybody checked in a raw dump if the instruction that we're trying to
> execute
> is actually vzeroupper?
I've checked this crash (https://crash-stats.mozilla.com/report/index/54a2ecaa-3bbf-488b-970a-d69de2160930#tab-rawdump) and the instruction is actually vzeroupper.
d2cd905: 5b pop %ebx
d2cd906: c3 ret
d2cd907: 8d a4 24 00 00 00 00 lea 0x0(%esp),%esp
d2cd90e: 8b ff mov %edi,%edi
d2cd910: 8b 44 24 04 mov 0x4(%esp),%eax
d2cd914: 8b 4c 24 08 mov 0x8(%esp),%ecx
d2cd918: 8b 54 24 0c mov 0xc(%esp),%edx
d2cd91c: c5 f8 77 vzeroupper
^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^
d2cd91f: c5 fa 6f 01 vmovdqu (%ecx),%xmm0
d2cd923: 8d 49 10 lea 0x10(%ecx),%ecx
d2cd926: 4a dec %edx
d2cd927: c4 e2 79 dc 01 vaesenc (%ecx),%xmm0,%xmm0
d2cd92c: 8d 49 10 lea 0x10(%ecx),%ecx
d2cd92f: 4a dec %edx
d2cd930: 75 f5 jne 0xd2cd927
d2cd932: c4 e2 79 dd 01 vaesenclast (%ecx),%xmm0,%xmm0
d2cd937: c4 e2 79 00 05 90 23 vpshufb 0xd2e2390,%xmm0,%xmm0
Comment 43•8 years ago
|
||
According to https://msdn.microsoft.com/en-us/library/ff545910.aspx#avx_registers, "In Windows 7 with Service Pack 1 (SP1), Windows Server 2008 R2, and newer versions of Windows, both x86 and x64 versions of the operating system preserve the AVX registers across thread (and process) switches.".
I don't see how not preserving the state of the AVX registers would cause a EXCEPTION_ILLEGAL_INSTRUCTION, but it could definitely cause hard to diagnose problems.
The crash occurs, even if with smaller frequency (out of 184, 19 with Windows 8.1, 1 with Windows 8, 1 with Windows 7 SP1), with versions of Windows newer than what is stated on that page.
Comment 44•8 years ago
|
||
Looks like there's a way to disable AVX on Windows, it was implemented by Microsoft as a workaround for some issue (the command is bcdedit /set xsavedisable 1, see https://support.microsoft.com/en-us/kb/2568088).
There's also a way to detect if AVX is enabled on Windows: https://msdn.microsoft.com/en-us/library/windows/desktop/hh134240(v=vs.85).aspx.
Comment 45•8 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #43)
> According to
> https://msdn.microsoft.com/en-us/library/ff545910.aspx#avx_registers, "In
> Windows 7 with Service Pack 1 (SP1), Windows Server 2008 R2, and newer
> versions of Windows, both x86 and x64 versions of the operating system
> preserve the AVX registers across thread (and process) switches.".
>
> I don't see how not preserving the state of the AVX registers would cause a
> EXCEPTION_ILLEGAL_INSTRUCTION, but it could definitely cause hard to
> diagnose problems.
> The crash occurs, even if with smaller frequency (out of 184, 19 with
> Windows 8.1, 1 with Windows 8, 1 with Windows 7 SP1), with versions of
> Windows newer than what is stated on that page.
The way it works is that the CPU blocks execution of the AVX instructions until the OS sets OSXSAVE in CR4 and XSETBV is used to set the correct bits.
Comment 46•8 years ago
|
||
(In reply to Yuhong Bao from comment #45)
> (In reply to Marco Castelluccio [:marco] from comment #43)
> > According to
> > https://msdn.microsoft.com/en-us/library/ff545910.aspx#avx_registers, "In
> > Windows 7 with Service Pack 1 (SP1), Windows Server 2008 R2, and newer
> > versions of Windows, both x86 and x64 versions of the operating system
> > preserve the AVX registers across thread (and process) switches.".
> >
> > I don't see how not preserving the state of the AVX registers would cause a
> > EXCEPTION_ILLEGAL_INSTRUCTION, but it could definitely cause hard to
> > diagnose problems.
> > The crash occurs, even if with smaller frequency (out of 184, 19 with
> > Windows 8.1, 1 with Windows 8, 1 with Windows 7 SP1), with versions of
> > Windows newer than what is stated on that page.
>
> The way it works is that the CPU blocks execution of the AVX instructions
> until the OS sets OSXSAVE in CR4 and XSETBV is used to set the correct bits.
If the CPU blocks the execution, than it would explain why we're getting
an EXCEPTION_ILLEGAL_INSTRUCTION.
Most OSes where we're crashing do not support AVX, so there must be
a bug in our code to detect it.
We have three buckets of OSes, out of 205 crashes:
- Versions older than Windows 7 SP1 (no support for AVX): 176
- Versions newer than Windows 7 SP1, 32-bit (where AVX is disabled, "the VEX instructions can only be used when running in 64-bit mode" from https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions): 27
- Versions newer than Windows 7 SP1, 64-bit (where AVX should be enabled, unless forcibly disabled by the user): 2
There are some snippets of code to detect AVX on https://software.intel.com/en-us/blogs/2011/04/14/is-avx-enabled/ and http://masm32.com/board/index.php?topic=3191.0.
Comment 47•8 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #46)
> We have three buckets of OSes, out of 205 crashes:
> - Versions newer than Windows 7 SP1, 32-bit (where AVX is disabled, "the VEX
> instructions can only be used when running in 64-bit mode" from
> https://software.intel.com/en-us/articles/introduction-to-intel-advanced-
> vector-extensions): 27
> - Versions newer than Windows 7 SP1, 64-bit (where AVX should be enabled,
> unless forcibly disabled by the user): 2
N.B.: By 32-bit and 64-bit here I meant the OS. Firefox is always 32-bit with
this crash.
Comment 48•8 years ago
|
||
The function which is checking the OS support is check_xcr0_ymm: https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/freebl/rijndael.c#1005.
It's using xgetbv in two different ways (for 32-bit builds manually with an
assembly snippet, for 64-bit builds with the _xgetbv function).
Perhaps we can revert the patch that introduced the assembly snippet (it was
needed to support VS 2010 RTM: https://bugzilla.mozilla.org/show_bug.cgi?id=979703#c23).
I don't see why the assembly snippet would be wrong, but perhaps directly using
_xgetbv from intrin.h would be safer (we're already doing it with libyuv:
https://dxr.mozilla.org/mozilla-central/source/media/libyuv/source/cpu_id.cc#102).
Comment 49•8 years ago
|
||
Crash volume for signature 'intel_aes_gcmINIT':
- nightly (version 52): 18 crashes from 2016-09-19.
- aurora (version 51): 0 crashes from 2016-09-19.
- beta (version 50): 86 crashes from 2016-09-20.
- release (version 49): 359 crashes from 2016-09-05.
- esr (version 45): 14 crashes from 2016-07-25.
Crash volume on the last weeks (Week N is from 10-17 to 10-23):
W. N-1 W. N-2 W. N-3 W. N-4
- nightly 17 0 1 0
- aurora 0 0 0 0
- beta 14 41 20 6
- release 98 100 99 29
- esr 0 0 5 0
Affected platform: Windows
Crash rank on the last 7 days:
Browser Content Plugin
- nightly #55
- aurora
- beta #900 #3115
- release #522
- esr
status-firefox52:
--- → affected
Comment 50•8 years ago
|
||
Firefox crash during startup after motherboard replacement
https://support.mozilla.org/en-US/questions/1144850#answer-931402
This problem was solved by doing a full XP repair; the partial repair I did first was not sufficient.
Comment 51•8 years ago
|
||
owner
I spoke to soon, it is re-occurring, here's the latest:
https://crash-stats.mozilla.com/report/index/a7e4d91d-b28e-4c54-a8cd-a77492161031
Comment 52•8 years ago
|
||
Anything further on this crash?
I'm running AMD FX 4300 CPU on Windows XP, XP is loaded through the Windows 10
boot loader menu (dual boot scenario)
I know that XP does not support AVX instructions. Is this relevant ?
Comment 53•8 years ago
|
||
Which mobo are you using?
Comment 54•8 years ago
|
||
It is the ASRock 960GC-GS FX
Let me mention something. In various Windows forums I've described this crash scenario but
the response is that it couldn't possibly be, it must be a coincidence.
I've only ever seen this crash when I've booted XP from the Windows 10 boot loader (normal dual boot situation)
On a hunch, I restored XP's original boot loader so I could boot XP natively. In this situation,
the crash has never occurred.
I'd be pleased to help or run any tests to get to the bottom of this problem - Thanks
Comment 55•8 years ago
|
||
Thanks. This might mean that it is a bug in BOOTMGR.
Comment 56•8 years ago
|
||
There are various documents on the web that talk about Windows 8/10 usage of hibernate/
Fast Start/Hybrid sleep causing problems in dual boot scenarios. But disabling that lot
did not affect the situation. I thought I was onto something for a minute.
Firefox is not the only program that is crashing. The avast anti virus ( now uninstalled and replaced)
created many dumps in it's appdata folder, it was unusable. Also FFmpeg.dll (video codec/tool)as used
in application Tmpgenc 2.5 also crashed exception illegal operation. All these only crashed when
XP was booted via the Windows 10 boot menu.
FF does not crash constantly. It's usually within a minute of starting it,
but not every time. I'm running it today after restoring the Windows 10 loader and nothing has
occurred (yet)
Comment 57•8 years ago
|
||
Perhaps similar occurrences in XP from other users could be checked to see if the Windows 10
(possibly Windows 7 & 8 also ?) boot loader scenario is evident.
I know that when problems are reported, this detail may not have been included
Comment 58•8 years ago
|
||
It does appear that CR4.OSFXSR is set early in newer BOOTMGR.
Comment 59•8 years ago
|
||
And CR4.OSXSAVE too.
Comment 60•8 years ago
|
||
Thanks for the update. It's a little above my head, I'm not a windows programmer.
But I'm assuming these values in the control register are significant to the problem.
Let me know if there is anything further I can do to help
Comment 61•8 years ago
|
||
I saw the brief email discussion ... interesting.
Has it been determined if the AMD CPU is to blame, or if there is a bug in the Windows 10
Bootmgr/Winload where some bits are set incorrectly when booting to NTLDR ?
Comment 62•8 years ago
|
||
BTW, Geoff Chappell is a consultant you can pay, if Mozilla is interested.
Comment 63•8 years ago
|
||
I have done the debugging and confirmed this is a bug in BOOTMGR failing to clear XCR0 and CR4.OSXSAVE when chaining to NTLDR.
Comment 64•8 years ago
|
||
Doing this debugging in a VM took quite a bit of work including getting a checked build of NTLDR to confirm.
![]() |
Assignee | |
Comment 65•8 years ago
|
||
Thank you for the excellent investigation, Yuhong Bao. Since this isn't a Firefox bug, the crash rate is low, and it seems hard to do anything about it, I suggest that we close this bug report as WONTFIX.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 8 years ago
Resolution: --- → WONTFIX
Comment 66•8 years ago
|
||
The bug should still be reported to Microsoft though.
Comment 67•8 years ago
|
||
Yuhong Bao I want to thank you myself for this work you did.
As was mentioned by Nicholas, the incidence of those experiencing this bug must be quite low.
(I wonder what % of Windows 10 users are also booting to XP?)
For myself, right now, I can find a work-around, boot XP from a floppy drive, for example.
How does one approach Microsoft with this kind of info? Secondly, would they even be interested
in fixing it?
One thing I didn't really understand. When the PC goes through POST, what is the content of those registers? Are those bits off by default ? (I assume they must be).
Comment 68•8 years ago
|
||
(In reply to davexnet01 from comment #67)
> One thing I didn't really understand. When the PC goes through POST, what
> is the content of those registers? Are those bits off by default ? (I assume
> they must be).
Yes, they should be.
![]() |
Assignee | |
Comment 69•8 years ago
|
||
(In reply to davexnet01 from comment #67)
> Yuhong Bao I want to thank you myself for this work you did.
> As was mentioned by Nicholas, the incidence of those experiencing this bug
> must be quite low.
I just checked crash-stats.mozilla.com. This crash signature accounts for 255 out of
3,242,628 crash reports submitted in the past 7 days. I.e. not that many.
Comment 70•8 years ago
|
||
If we do anything, it could be to blacklist the processor in the AVX test. I think that I'd be happy to take code that did that.
Comment 71•8 years ago
|
||
You would be better off using IsProcessorFeaturePresent(PF_XSAVE_ENABLED)
Comment 72•8 years ago
|
||
Regarding Martin's comment above, is he implying that some errata in the AMD FX
is involved in this problem? Would the problem not arise if an Intel CPU
was used in an otherwise similar scenario ?
Comment 73•8 years ago
|
||
For posterity reasons only here is my configuration of the machine where I encounter this crash:
One HDD 500GB with the following OSs each on it's own partition:
- Windows XP 32bit
- Windows Vista 64bit
- Windows 7 32bit
- Windows 7 64bit
- Windows 8.1 32bit
- Windows 8.1 64bit
One SSD 256GB with the following OSs each on it's own partition:
- Windows 7 64bit
- Windows 10 64bit
- Ubuntu 16.04 32bit
One HDD 500GB with the following OSs each on it's own partition:
- Ubuntu 12.04 32bit
- Ubuntu 16.04 64bit
- Ubuntu 14.04 32bit
Software used to make this configuration was EasyBCD.
The CPU, Memory, BIOS etc. can be seen in comment 36.
Comment 74•8 years ago
|
||
FYI, I just finished a version for intel-gcm.s in bug 1283585
Comment 75•8 years ago
|
||
Good news! In Win10 1703, bootmgr!ImgPcatStartLegacyLoader now call bootmgr!ArchRestoreProcessorFeatures (to get rid of XSAVE bits)! This does not deal with CR4.OSFXSR, but it is a step forward.
Updated•3 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•