Closed
Bug 1263495
Opened 9 years ago
Closed 9 years ago
crash in intel_aes_gcmINIT due to bad AVX instruction detection in freebl?
Categories
(NSS :: Libraries, defect)
Tracking
(firefox48 affected)
RESOLVED
DUPLICATE
of bug 1225094
Future
Tracking | Status | |
---|---|---|
firefox48 | --- | affected |
People
(Reporter: n.nethercote, Unassigned)
References
Details
(Keywords: crash)
Crash Data
Attachments
(2 files, 4 obsolete files)
51.51 KB,
patch
|
mt
:
feedback-
|
Details | Diff | Splinter Review |
31.55 KB,
text/plain
|
Details |
This bug was filed from the Socorro interface and is
report bp-802fa6a0-2ba0-4e32-86da-c22cd2160408.
=============================================================
This crash has occurred 210 times in the past 7 days, across many versions of Firefox and Thunderbird. Looks like it's been around for a long time.
The crash is due to an illegal instruction being executed. The instruction is always 'vzeroupper', which is an AVX instruction, and thus one that won't be supported by many processors.
There is code in security/nss/lib/freebl/rijndael.c to detect whether AVX instructions can be used. Perhaps the detection code is buggy, or perhaps some processors claim to implement this instruction when really they don't.
AMD processors are over-represented among the crashes, constituting 63 out of 210.
93% of the reported uptimes are less than 1 minute, so it's almost a start-up crash. It'll presumably happen consistently for anyone affected, so it's the kind of bug that'll cause us to lose users.
![]() |
Reporter | |
Comment 1•9 years ago
|
||
mt, do you know who might be able to look into this?
Flags: needinfo?(martin.thomson)
![]() |
Reporter | |
Comment 2•9 years ago
|
||
I compared freebl's AVX detection code (https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/freebl/rijndael.c#1054) to SpiderMonkey's equivalent code (https://dxr.mozilla.org/mozilla-central/rev/d62963756d9a9d19cbbb5d8f3dd3c7cfa8fdef88/js/src/jit/x86-shared/Assembler-x86-shared.cpp#245). They seem to be equivalent, doing the following steps:
- execute CPUID
- check bits 27 and 28 of %ecx are 1
- set %ecx to 0, execute XGETBV
- check bits 1 and 2 of %eax are 1
So there's nothing obviously wrong based on that.
Comment 3•9 years ago
|
||
Interesting. Most crashes are Windows XP.
Windows XP 180 85.7%
Windows Unknown 20 9.5%
Windows 8.1 8 3.8%
Windows 7 2 1.0%
![]() |
Reporter | |
Comment 4•9 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #3)
> Interesting. Most crashes are Windows XP.
Makes sense if it's more common on older machines that lack full or partial AVX support.
Assignee: nobody → nobody
Component: General → Libraries
Product: Core → NSS
Target Milestone: --- → Future
Version: Trunk → trunk
Comment 5•9 years ago
|
||
I'm not really qualified here, but looking at the libyuv code there is an additional check:
https://dxr.mozilla.org/mozilla-central/source/media/libyuv/source/cpu_id.cc#200
Note that this seems to be AVX2, not just AVX. Only NSS and libyuv use that instruction as far as I can see.
Bob, I'm now out of my depth. Any ideas?
Flags: needinfo?(martin.thomson) → needinfo?(rrelyea)
![]() |
Reporter | |
Comment 6•9 years ago
|
||
> Note that this seems to be AVX2, not just AVX.
Are you sure? https://en.wikipedia.org/wiki/Advanced_Vector_Extensions lists it under the original AVX instruction set, not under AVX 2.
Comment 7•9 years ago
|
||
No, you are right. To which I can only offer my befuddlement. The detection code looks fine.
Comment 8•9 years ago
|
||
At least, AVX support requires on Windows 7 SP1+. Maybe, ymm registers doesn't store correctly into context?
https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134240(v=vs.85).aspx
https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134233(v=vs.85).aspx
Comment 9•9 years ago
|
||
I wonder if they are using virtualization software.
Comment 10•9 years ago
|
||
As a side note, PCLMULQDQ should not depend on AVX/XSAVE support anyway.
Comment 11•9 years ago
|
||
The first comment in this post seems to indicate that a CPU check is insufficient for AVX detection, and that a runtime check with the OS is needed:
https://connect.microsoft.com/VisualStudio/Feedback/Details/981479
Comment 12•9 years ago
|
||
Fixing it would probably be as simple as removing the "v" at the beginning of each instruction, and any references to YMM registers (at first glance there appears to be none).
Comment 13•9 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #8)
> At least, AVX support requires on Windows 7 SP1+. Maybe, ymm registers
> doesn't store correctly into context?
Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't explain the ones with Windows 7+.
Comment 14•9 years ago
|
||
I bet this is the same as bug 1096651. The freebl_cpuid function is not ensuring that ecx has meaningful values.
Comment 15•9 years ago
|
||
Confirmed that the original file generate instructions that include VEX prefix. Fixing it right now.
Comment 16•9 years ago
|
||
To clarify, the file involved is intel-gcm-x86-masm.asm from the NSS sources.
Comment 17•9 years ago
|
||
I also had to reduce the three operands instructions to two operands, often by adding extra movdqa instructions.
Comment 18•9 years ago
|
||
We've ran into a similiar issue before. You definitely need to check if the OS is saving the registers, otherwise you crash on older OS versions or you will crash. We've had old code that failed to do this. I don't think that's the problem now since our new code has the functions.
Yuhong, can you supply a patch to the changes you made. My guess is we should probably just check the appropriate (VEX) bit. vpclmulqdq is pretty fundamental to this code (though I think we check specifically for this instruction).
> Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't
> explain the ones with Windows 7+.
Windows XP must be running old code, since we check the ymm bit in CR0 and it wouldn't even get into this code in a modern version.
bob
Flags: needinfo?(rrelyea)
Comment 19•9 years ago
|
||
I probably will just submit the new file, since the patch would be fairly large. Yes, I replaced vpclmulqdq with pclmulqdq (this would also make it work with pre-AVX CPUs that support it).
Comment 20•9 years ago
|
||
Not tested. Note the alignment requirements in the source.
Comment 21•9 years ago
|
||
I'm certainly not qualified to check this. All I did was add the file in attachment 8743621 [details], fix some whitespace up (trailing spaces, tabs to spaces, removed extra blank lines), and checked that our tests pass. I don't know if this causes a performance regression.
Attachment #8743630 -
Flags: review?(rrelyea)
Comment 22•9 years ago
|
||
It might, but I think that is still better than crashing.
Comment 23•9 years ago
|
||
As a side note, it is possible to set a BCD option in Windows to disable XSAVE support.
Comment 24•9 years ago
|
||
Results from running performance tests on my Linux box (though I don't know how to interpret these):
MODE INPUT SIZE (bytes) SYMMETRIC KEY SIZE (bits) REPETITIONS (cx/op) CONTEXT CREATION TIME (ms) OPERATION TIME (ms)
Before:
aes_gcm_c 0b 128 100T/0 265.000 0.000 0.000 0b
aes_gcm_e 78Mb 128 0/10T 0.000 4226.000 4.226 18Mb
aes_gcm_d 78Mb 128 0/10T 0.000 3539.000 3.540 22Mb
aes_gcm_c 0b 256 100T/0 338.000 0.000 0.000 0b
aes_gcm_e 78Mb 256 0/10T 0.000 4471.000 4.472 17Mb
aes_gcm_d 78Mb 256 0/10T 0.000 3450.000 3.450 22Mb
After:
aes_gcm_c 0b 128 100T/0 265.000 0.000 0.000 0b
aes_gcm_e 78Mb 128 0/10T 0.000 4221.000 4.221 18Mb
aes_gcm_d 78Mb 128 0/10T 0.000 3433.000 3.433 22Mb
aes_gcm_c 0b 256 100T/0 339.000 0.000 0.000 0b
aes_gcm_e 78Mb 256 0/10T 0.000 4472.000 4.473 17Mb
aes_gcm_d 78Mb 256 0/10T 0.000 3455.000 3.455 22Mb
I'm not seeing cause for concern. I would like to get independent verification that the code remains constant-time though. A performance regression is one thing, but a timing side channel would be a deal-breaker. I see no reason that the changes would have added anything, but I'd like independent confirmation of that.
Comment 25•9 years ago
|
||
I think most of it would be AVX-SSE transition penalties, which I don't think NSS cares about though. The only additional instructions I have added are movdqa instructions.
Comment 26•9 years ago
|
||
Actually, the 128-bit decrypt looks to be significantly faster in comment 24. I ran the test several more times and it seems like the one above is an anomaly. Though it makes me more determined to see someone verify that these changes are constant time.
Comment 27•9 years ago
|
||
Note that I only changed the masm version, not gas.
Comment 28•9 years ago
|
||
Comment on attachment 8743630 [details] [diff] [review]
bug1263495.patch
Bah, after updating my entire Windows toolchain, which took hours, I ran these tests. I get a crash on line 141 of the modified file. That is:
- vzeroupper
; AES-ENC(0)
- vmovdqu T, XMMWORD PTR[KS]
+ movdqu T, XMMWORD PTR[KS]
lea KS, [16 + KS]
dec NR
Lenc_loop:
- vaesenc T, T, [KS]
+ aesenc T, [KS] <<<< here
Attachment #8743630 -
Flags: review?(rrelyea) → feedback-
Comment 29•9 years ago
|
||
In light of this, and the size of the patch, could we simply port https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1096651&attachment=8520997 over as suggest in comment 14.
Comment 30•9 years ago
|
||
Yea, probably because the pointer is unaligned. AVX changed these kinds of instructions to allow unaligned pointers.
Comment 31•9 years ago
|
||
Note for testing: I think newer AMD processors do allow unaligned pointers even for normal SSE (non-AVX) instructions, so this kind of bug may go undetected.
Comment 32•9 years ago
|
||
Actually, I found out this new behaviour requires a bit in MXCSR to be set, otherwise it would continue to crash just like before.
Comment 33•9 years ago
|
||
Anyway, fortunately there seems to be a free register in this code, so if the KS pointer can't be aligned another movdqu can be used.
Comment 34•9 years ago
|
||
(In reply to Martin Thomson [:mt:] from comment #29)
> In light of this, and the size of the patch, could we simply port
> https://bugzilla.mozilla.org/page.cgi?id=splinter.
> html&bug=1096651&attachment=8520997 over as suggest in comment 14.
CPUID leaf 1 should return a value in ecx on any CPU that supports SSE3.
Comment 35•9 years ago
|
||
Anyway, I found https://bugzilla.mozilla.org/show_bug.cgi?id=1225094
Comment 36•9 years ago
|
||
That being said, it looks like the pushad and popad instructions are unneeded and make the disassembled code harder to read too.
Comment 37•9 years ago
|
||
Yes, I am talking about freebl_cpuid, specifically https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/freebl/mpi/mpcpucache.c#115
![]() |
Reporter | |
Comment 38•9 years ago
|
||
Bug 1225094 is older, so let's dup this one to it.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Comment 39•9 years ago
|
||
Anyway, pclmulqdq seems to be commutative, which means that I can fix the alignment problem in the asm pretty easily.
Comment 40•9 years ago
|
||
Though it would be better if intel-gcm-warp.c used an aligned allocator.
Comment 41•9 years ago
|
||
Attachment #8743621 -
Attachment is obsolete: true
Comment 42•9 years ago
|
||
Translate this version to other assemblers and the AVX-checking code can be removed.
Comment 43•9 years ago
|
||
Attachment #8766663 -
Attachment is obsolete: true
Comment 44•9 years ago
|
||
Attachment #8766665 -
Attachment is obsolete: true
Comment 45•9 years ago
|
||
Attachment #8766666 -
Attachment is obsolete: true
Comment 46•9 years ago
|
||
See bug 1283585.
You need to log in
before you can comment on or make changes to this bug.
Description
•