crash in intel_aes_gcmINIT due to bad AVX instruction detection in freebl?

RESOLVED DUPLICATE of bug 1225094

Status

--
critical
RESOLVED DUPLICATE of bug 1225094
3 years ago
3 years ago

People

(Reporter: njn, Unassigned)

Tracking

({crash})

trunk
Future
x86
Windows NT
crash

Firefox Tracking Flags

(firefox48 affected)

Details

(crash signature)

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-802fa6a0-2ba0-4e32-86da-c22cd2160408.
=============================================================

This crash has occurred 210 times in the past 7 days, across many versions of Firefox and Thunderbird. Looks like it's been around for a long time.

The crash is due to an illegal instruction being executed. The instruction is always 'vzeroupper', which is an AVX instruction, and thus one that won't be supported by many processors.

There is code in security/nss/lib/freebl/rijndael.c to detect whether AVX instructions can be used. Perhaps the detection code is buggy, or perhaps some processors claim to implement this instruction when really they don't.

AMD processors are over-represented among the crashes, constituting 63 out of 210.

93% of the reported uptimes are less than 1 minute, so it's almost a start-up crash. It'll presumably happen consistently for anyone affected, so it's the kind of bug that'll cause us to lose users.
(Reporter)

Comment 1

3 years ago
mt, do you know who might be able to look into this?
Flags: needinfo?(martin.thomson)
(Reporter)

Comment 2

3 years ago
I compared freebl's AVX detection code (https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/freebl/rijndael.c#1054) to SpiderMonkey's equivalent code (https://dxr.mozilla.org/mozilla-central/rev/d62963756d9a9d19cbbb5d8f3dd3c7cfa8fdef88/js/src/jit/x86-shared/Assembler-x86-shared.cpp#245). They seem to be equivalent, doing the following steps:

- execute CPUID
- check bits 27 and 28 of %ecx are 1
- set %ecx to 0, execute XGETBV
- check bits 1 and 2 of %eax are 1

So there's nothing obviously wrong based on that.
Interesting.  Most crashes are Windows XP.

Windows XP 	180 	85.7%
Windows Unknown 	20 	9.5%
Windows 8.1 	8 	3.8%
Windows 7 	2 	1.0%
(Reporter)

Comment 4

3 years ago
(In reply to Makoto Kato [:m_kato] from comment #3)
> Interesting.  Most crashes are Windows XP.

Makes sense if it's more common on older machines that lack full or partial AVX support.
Assignee: nobody → nobody
Component: General → Libraries
Product: Core → NSS
Target Milestone: --- → Future
Version: Trunk → trunk
I'm not really qualified here, but looking at the libyuv code there is an additional check:

https://dxr.mozilla.org/mozilla-central/source/media/libyuv/source/cpu_id.cc#200

Note that this seems to be AVX2, not just AVX.  Only NSS and libyuv use that instruction as far as I can see.

Bob, I'm now out of my depth.  Any ideas?
Flags: needinfo?(martin.thomson) → needinfo?(rrelyea)
(Reporter)

Comment 6

3 years ago
> Note that this seems to be AVX2, not just AVX.

Are you sure?  https://en.wikipedia.org/wiki/Advanced_Vector_Extensions lists it under the original AVX instruction set, not under AVX 2.
No, you are right.  To which I can only offer my befuddlement.  The detection code looks fine.
At least, AVX support requires on Windows 7 SP1+.  Maybe, ymm registers doesn't store correctly into context?

https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134240(v=vs.85).aspx
https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134233(v=vs.85).aspx

Comment 9

3 years ago
I wonder if they are using virtualization software.

Comment 10

3 years ago
As a side note, PCLMULQDQ should not depend on AVX/XSAVE support anyway.
The first comment in this post seems to indicate that a CPU check is insufficient for AVX detection, and that a runtime check with the OS is needed:

https://connect.microsoft.com/VisualStudio/Feedback/Details/981479

Comment 12

3 years ago
Fixing it would probably be as simple as removing the "v" at the beginning of each instruction, and any references to YMM registers (at first glance there appears to be none).
(In reply to Makoto Kato [:m_kato] from comment #8)
> At least, AVX support requires on Windows 7 SP1+.  Maybe, ymm registers
> doesn't store correctly into context?

Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't explain the ones with Windows 7+.
I bet this is the same as bug 1096651. The freebl_cpuid function is not ensuring that ecx has meaningful values.

Comment 15

3 years ago
Confirmed that the original file generate instructions that include VEX prefix. Fixing it right now.

Comment 16

3 years ago
To clarify, the file involved is intel-gcm-x86-masm.asm from the NSS sources.

Comment 17

3 years ago
I also had to reduce the three operands instructions to two operands, often by adding extra movdqa instructions.

Comment 18

3 years ago
We've ran into a similiar issue before. You definitely need to check if the OS is saving the registers, otherwise you crash on older OS versions or you will crash. We've had old code that failed to do this. I don't think that's the problem now since our new code has the functions.

Yuhong, can you supply a patch to the changes you made. My guess is we should probably just check the appropriate (VEX) bit. vpclmulqdq is pretty fundamental to this code (though I think we check specifically for this instruction).

> Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't 
> explain the ones with Windows 7+.

Windows XP must be running old code, since we check the ymm bit in CR0 and it wouldn't even get into this code in a modern version.

bob
Flags: needinfo?(rrelyea)

Comment 19

3 years ago
I probably will just submit the new file, since the patch would be fairly large. Yes, I replaced vpclmulqdq with pclmulqdq (this would also make it work with pre-AVX CPUs that support it).

Comment 20

3 years ago
Created attachment 8743621 [details]
intel-gcm-x86-masm.asm

Not tested. Note the alignment requirements in the source.
Created attachment 8743630 [details] [diff] [review]
bug1263495.patch

I'm certainly not qualified to check this.  All I did was add the file in attachment 8743621 [details], fix some whitespace up (trailing spaces, tabs to spaces, removed extra blank lines), and checked that our tests pass.  I don't know if this causes a performance regression.
Attachment #8743630 - Flags: review?(rrelyea)

Comment 22

3 years ago
It might, but I think that is still better than crashing.

Comment 23

3 years ago
As a side note, it is possible to set a BCD option in Windows to disable XSAVE support.
Results from running performance tests on my Linux box (though I don't know how to interpret these):

MODE	INPUT SIZE (bytes)	SYMMETRIC KEY SIZE (bits)	REPETITIONS (cx/op)	CONTEXT CREATION TIME (ms)	OPERATION TIME (ms)
Before:
aes_gcm_c	0b	128	100T/0	265.000	0.000 0.000 0b
aes_gcm_e	78Mb	128	0/10T	0.000	4226.000 4.226 18Mb
aes_gcm_d	78Mb	128	0/10T	0.000	3539.000 3.540 22Mb
aes_gcm_c	0b	256	100T/0	338.000	0.000 0.000 0b
aes_gcm_e	78Mb	256	0/10T	0.000	4471.000 4.472 17Mb
aes_gcm_d	78Mb	256	0/10T	0.000	3450.000 3.450 22Mb
After:
aes_gcm_c	0b	128	100T/0	265.000	0.000 0.000 0b
aes_gcm_e	78Mb	128	0/10T	0.000	4221.000 4.221 18Mb
aes_gcm_d	78Mb	128	0/10T	0.000	3433.000 3.433 22Mb
aes_gcm_c	0b	256	100T/0	339.000	0.000 0.000 0b
aes_gcm_e	78Mb	256	0/10T	0.000	4472.000 4.473 17Mb
aes_gcm_d	78Mb	256	0/10T	0.000	3455.000 3.455 22Mb


I'm not seeing cause for concern.  I would like to get independent verification that the code remains constant-time though.  A performance regression is one thing, but a timing side channel would be a deal-breaker.  I see no reason that the changes would have added anything, but I'd like independent confirmation of that.

Comment 25

3 years ago
I think most of it would be AVX-SSE transition penalties, which I don't think NSS cares about though. The only additional instructions I have added are movdqa instructions.
Actually, the 128-bit decrypt looks to be significantly faster in comment 24.  I ran the test several more times and it seems like the one above is an anomaly.  Though it makes me more determined to see someone verify that these changes are constant time.

Comment 27

3 years ago
Note that I only changed the masm version, not gas.
Comment on attachment 8743630 [details] [diff] [review]
bug1263495.patch

Bah, after updating my entire Windows toolchain, which took hours, I ran these tests.  I get a crash on line 141 of the modified file.  That is:

-    vzeroupper
     ; AES-ENC(0)
-    vmovdqu T, XMMWORD PTR[KS]
+    movdqu T, XMMWORD PTR[KS]
     lea KS, [16 + KS]
     dec NR
 Lenc_loop:
-        vaesenc T, T, [KS]
+        aesenc T, [KS] <<<< here
Attachment #8743630 - Flags: review?(rrelyea) → feedback-
In light of this, and the size of the patch, could we simply port https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1096651&attachment=8520997 over as suggest in comment 14.

Comment 30

3 years ago
Yea, probably because the pointer is unaligned. AVX changed these kinds of instructions to allow unaligned pointers.

Comment 31

3 years ago
Note for testing: I think newer AMD processors do allow unaligned pointers even for normal SSE (non-AVX) instructions, so this kind of bug may go undetected.

Comment 32

3 years ago
Actually, I found out this new behaviour requires a bit in MXCSR to be set, otherwise it would continue to crash just like before.

Comment 33

3 years ago
Anyway, fortunately there seems to be a free register in this code, so if the KS pointer can't be aligned another movdqu can be used.

Comment 34

3 years ago
(In reply to Martin Thomson [:mt:] from comment #29)
> In light of this, and the size of the patch, could we simply port
> https://bugzilla.mozilla.org/page.cgi?id=splinter.
> html&bug=1096651&attachment=8520997 over as suggest in comment 14.

CPUID leaf 1 should return a value in ecx on any CPU that supports SSE3.

Comment 36

3 years ago
That being said, it looks like the pushad and popad instructions are unneeded and make the disassembled code harder to read too.
See Also: → bug 1225094
(Reporter)

Comment 38

3 years ago
Bug 1225094 is older, so let's dup this one to it.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1225094

Comment 39

3 years ago
Anyway, pclmulqdq seems to be commutative, which means that I can fix the alignment problem in the asm pretty easily.

Comment 40

3 years ago
Though it would be better if intel-gcm-warp.c used an aligned allocator.

Comment 41

3 years ago
Created attachment 8766663 [details]
New version that fixes alignment problems and other bugs
Attachment #8743621 - Attachment is obsolete: true

Comment 42

3 years ago
Translate this version to other assemblers and the AVX-checking code can be removed.

Comment 43

3 years ago
Created attachment 8766665 [details]
Another new version
Attachment #8766663 - Attachment is obsolete: true

Comment 44

3 years ago
Created attachment 8766666 [details]
Fix movdqa to movdqu in LEndEnc7
Attachment #8766665 - Attachment is obsolete: true

Comment 45

3 years ago
Created attachment 8766675 [details]
Fix pxor in LDecData7
Attachment #8766666 - Attachment is obsolete: true

Comment 46

3 years ago
See bug 1283585.
You need to log in before you can comment on or make changes to this bug.