Closed Bug 1263495 Opened 8 years ago Closed 8 years ago

crash in intel_aes_gcmINIT due to bad AVX instruction detection in freebl?

Categories

(NSS :: Libraries, defect)

x86
Windows NT
defect
Not set
critical

Tracking

(firefox48 affected)

RESOLVED DUPLICATE of bug 1225094
Future
Tracking Status
firefox48 --- affected

People

(Reporter: n.nethercote, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files, 4 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-802fa6a0-2ba0-4e32-86da-c22cd2160408.
=============================================================

This crash has occurred 210 times in the past 7 days, across many versions of Firefox and Thunderbird. Looks like it's been around for a long time.

The crash is due to an illegal instruction being executed. The instruction is always 'vzeroupper', which is an AVX instruction, and thus one that won't be supported by many processors.

There is code in security/nss/lib/freebl/rijndael.c to detect whether AVX instructions can be used. Perhaps the detection code is buggy, or perhaps some processors claim to implement this instruction when really they don't.

AMD processors are over-represented among the crashes, constituting 63 out of 210.

93% of the reported uptimes are less than 1 minute, so it's almost a start-up crash. It'll presumably happen consistently for anyone affected, so it's the kind of bug that'll cause us to lose users.
mt, do you know who might be able to look into this?
Flags: needinfo?(martin.thomson)
I compared freebl's AVX detection code (https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/freebl/rijndael.c#1054) to SpiderMonkey's equivalent code (https://dxr.mozilla.org/mozilla-central/rev/d62963756d9a9d19cbbb5d8f3dd3c7cfa8fdef88/js/src/jit/x86-shared/Assembler-x86-shared.cpp#245). They seem to be equivalent, doing the following steps:

- execute CPUID
- check bits 27 and 28 of %ecx are 1
- set %ecx to 0, execute XGETBV
- check bits 1 and 2 of %eax are 1

So there's nothing obviously wrong based on that.
Interesting.  Most crashes are Windows XP.

Windows XP 	180 	85.7%
Windows Unknown 	20 	9.5%
Windows 8.1 	8 	3.8%
Windows 7 	2 	1.0%
(In reply to Makoto Kato [:m_kato] from comment #3)
> Interesting.  Most crashes are Windows XP.

Makes sense if it's more common on older machines that lack full or partial AVX support.
Assignee: nobody → nobody
Component: General → Libraries
Product: Core → NSS
Target Milestone: --- → Future
Version: Trunk → trunk
I'm not really qualified here, but looking at the libyuv code there is an additional check:

https://dxr.mozilla.org/mozilla-central/source/media/libyuv/source/cpu_id.cc#200

Note that this seems to be AVX2, not just AVX.  Only NSS and libyuv use that instruction as far as I can see.

Bob, I'm now out of my depth.  Any ideas?
Flags: needinfo?(martin.thomson) → needinfo?(rrelyea)
> Note that this seems to be AVX2, not just AVX.

Are you sure?  https://en.wikipedia.org/wiki/Advanced_Vector_Extensions lists it under the original AVX instruction set, not under AVX 2.
No, you are right.  To which I can only offer my befuddlement.  The detection code looks fine.
At least, AVX support requires on Windows 7 SP1+.  Maybe, ymm registers doesn't store correctly into context?

https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134240(v=vs.85).aspx
https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134233(v=vs.85).aspx
I wonder if they are using virtualization software.
As a side note, PCLMULQDQ should not depend on AVX/XSAVE support anyway.
The first comment in this post seems to indicate that a CPU check is insufficient for AVX detection, and that a runtime check with the OS is needed:

https://connect.microsoft.com/VisualStudio/Feedback/Details/981479
Fixing it would probably be as simple as removing the "v" at the beginning of each instruction, and any references to YMM registers (at first glance there appears to be none).
(In reply to Makoto Kato [:m_kato] from comment #8)
> At least, AVX support requires on Windows 7 SP1+.  Maybe, ymm registers
> doesn't store correctly into context?

Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't explain the ones with Windows 7+.
I bet this is the same as bug 1096651. The freebl_cpuid function is not ensuring that ecx has meaningful values.
Confirmed that the original file generate instructions that include VEX prefix. Fixing it right now.
To clarify, the file involved is intel-gcm-x86-masm.asm from the NSS sources.
I also had to reduce the three operands instructions to two operands, often by adding extra movdqa instructions.
We've ran into a similiar issue before. You definitely need to check if the OS is saving the registers, otherwise you crash on older OS versions or you will crash. We've had old code that failed to do this. I don't think that's the problem now since our new code has the functions.

Yuhong, can you supply a patch to the changes you made. My guess is we should probably just check the appropriate (VEX) bit. vpclmulqdq is pretty fundamental to this code (though I think we check specifically for this instruction).

> Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't 
> explain the ones with Windows 7+.

Windows XP must be running old code, since we check the ymm bit in CR0 and it wouldn't even get into this code in a modern version.

bob
Flags: needinfo?(rrelyea)
I probably will just submit the new file, since the patch would be fairly large. Yes, I replaced vpclmulqdq with pclmulqdq (this would also make it work with pre-AVX CPUs that support it).
Attached file intel-gcm-x86-masm.asm (obsolete) —
Not tested. Note the alignment requirements in the source.
Attached patch bug1263495.patchSplinter Review
I'm certainly not qualified to check this.  All I did was add the file in attachment 8743621 [details], fix some whitespace up (trailing spaces, tabs to spaces, removed extra blank lines), and checked that our tests pass.  I don't know if this causes a performance regression.
Attachment #8743630 - Flags: review?(rrelyea)
It might, but I think that is still better than crashing.
As a side note, it is possible to set a BCD option in Windows to disable XSAVE support.
Results from running performance tests on my Linux box (though I don't know how to interpret these):

MODE	INPUT SIZE (bytes)	SYMMETRIC KEY SIZE (bits)	REPETITIONS (cx/op)	CONTEXT CREATION TIME (ms)	OPERATION TIME (ms)
Before:
aes_gcm_c	0b	128	100T/0	265.000	0.000 0.000 0b
aes_gcm_e	78Mb	128	0/10T	0.000	4226.000 4.226 18Mb
aes_gcm_d	78Mb	128	0/10T	0.000	3539.000 3.540 22Mb
aes_gcm_c	0b	256	100T/0	338.000	0.000 0.000 0b
aes_gcm_e	78Mb	256	0/10T	0.000	4471.000 4.472 17Mb
aes_gcm_d	78Mb	256	0/10T	0.000	3450.000 3.450 22Mb
After:
aes_gcm_c	0b	128	100T/0	265.000	0.000 0.000 0b
aes_gcm_e	78Mb	128	0/10T	0.000	4221.000 4.221 18Mb
aes_gcm_d	78Mb	128	0/10T	0.000	3433.000 3.433 22Mb
aes_gcm_c	0b	256	100T/0	339.000	0.000 0.000 0b
aes_gcm_e	78Mb	256	0/10T	0.000	4472.000 4.473 17Mb
aes_gcm_d	78Mb	256	0/10T	0.000	3455.000 3.455 22Mb


I'm not seeing cause for concern.  I would like to get independent verification that the code remains constant-time though.  A performance regression is one thing, but a timing side channel would be a deal-breaker.  I see no reason that the changes would have added anything, but I'd like independent confirmation of that.
I think most of it would be AVX-SSE transition penalties, which I don't think NSS cares about though. The only additional instructions I have added are movdqa instructions.
Actually, the 128-bit decrypt looks to be significantly faster in comment 24.  I ran the test several more times and it seems like the one above is an anomaly.  Though it makes me more determined to see someone verify that these changes are constant time.
Note that I only changed the masm version, not gas.
Comment on attachment 8743630 [details] [diff] [review]
bug1263495.patch

Bah, after updating my entire Windows toolchain, which took hours, I ran these tests.  I get a crash on line 141 of the modified file.  That is:

-    vzeroupper
     ; AES-ENC(0)
-    vmovdqu T, XMMWORD PTR[KS]
+    movdqu T, XMMWORD PTR[KS]
     lea KS, [16 + KS]
     dec NR
 Lenc_loop:
-        vaesenc T, T, [KS]
+        aesenc T, [KS] <<<< here
Attachment #8743630 - Flags: review?(rrelyea) → feedback-
In light of this, and the size of the patch, could we simply port https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1096651&attachment=8520997 over as suggest in comment 14.
Yea, probably because the pointer is unaligned. AVX changed these kinds of instructions to allow unaligned pointers.
Note for testing: I think newer AMD processors do allow unaligned pointers even for normal SSE (non-AVX) instructions, so this kind of bug may go undetected.
Actually, I found out this new behaviour requires a bit in MXCSR to be set, otherwise it would continue to crash just like before.
Anyway, fortunately there seems to be a free register in this code, so if the KS pointer can't be aligned another movdqu can be used.
(In reply to Martin Thomson [:mt:] from comment #29)
> In light of this, and the size of the patch, could we simply port
> https://bugzilla.mozilla.org/page.cgi?id=splinter.
> html&bug=1096651&attachment=8520997 over as suggest in comment 14.

CPUID leaf 1 should return a value in ecx on any CPU that supports SSE3.
That being said, it looks like the pushad and popad instructions are unneeded and make the disassembled code harder to read too.
See Also: → 1225094
Bug 1225094 is older, so let's dup this one to it.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Anyway, pclmulqdq seems to be commutative, which means that I can fix the alignment problem in the asm pretty easily.
Though it would be better if intel-gcm-warp.c used an aligned allocator.
Attachment #8743621 - Attachment is obsolete: true
Translate this version to other assemblers and the AVX-checking code can be removed.
Attached file Another new version (obsolete) —
Attachment #8766663 - Attachment is obsolete: true
Attached file Fix movdqa to movdqu in LEndEnc7 (obsolete) —
Attachment #8766665 - Attachment is obsolete: true
Attached file Fix pxor in LDecData7
Attachment #8766666 - Attachment is obsolete: true
See bug 1283585.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: