This bug was filed from the Socorro interface and is report bp-802fa6a0-2ba0-4e32-86da-c22cd2160408. ============================================================= This crash has occurred 210 times in the past 7 days, across many versions of Firefox and Thunderbird. Looks like it's been around for a long time. The crash is due to an illegal instruction being executed. The instruction is always 'vzeroupper', which is an AVX instruction, and thus one that won't be supported by many processors. There is code in security/nss/lib/freebl/rijndael.c to detect whether AVX instructions can be used. Perhaps the detection code is buggy, or perhaps some processors claim to implement this instruction when really they don't. AMD processors are over-represented among the crashes, constituting 63 out of 210. 93% of the reported uptimes are less than 1 minute, so it's almost a start-up crash. It'll presumably happen consistently for anyone affected, so it's the kind of bug that'll cause us to lose users.
mt, do you know who might be able to look into this?
I compared freebl's AVX detection code (https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/freebl/rijndael.c#1054) to SpiderMonkey's equivalent code (https://dxr.mozilla.org/mozilla-central/rev/d62963756d9a9d19cbbb5d8f3dd3c7cfa8fdef88/js/src/jit/x86-shared/Assembler-x86-shared.cpp#245). They seem to be equivalent, doing the following steps: - execute CPUID - check bits 27 and 28 of %ecx are 1 - set %ecx to 0, execute XGETBV - check bits 1 and 2 of %eax are 1 So there's nothing obviously wrong based on that.
Interesting. Most crashes are Windows XP. Windows XP 180 85.7% Windows Unknown 20 9.5% Windows 8.1 8 3.8% Windows 7 2 1.0%
(In reply to Makoto Kato [:m_kato] from comment #3) > Interesting. Most crashes are Windows XP. Makes sense if it's more common on older machines that lack full or partial AVX support.
Assignee: nobody → nobody
Component: General → Libraries
Product: Core → NSS
Target Milestone: --- → Future
Version: Trunk → trunk
I'm not really qualified here, but looking at the libyuv code there is an additional check: https://dxr.mozilla.org/mozilla-central/source/media/libyuv/source/cpu_id.cc#200 Note that this seems to be AVX2, not just AVX. Only NSS and libyuv use that instruction as far as I can see. Bob, I'm now out of my depth. Any ideas?
Flags: needinfo?(martin.thomson) → needinfo?(rrelyea)
> Note that this seems to be AVX2, not just AVX. Are you sure? https://en.wikipedia.org/wiki/Advanced_Vector_Extensions lists it under the original AVX instruction set, not under AVX 2.
No, you are right. To which I can only offer my befuddlement. The detection code looks fine.
At least, AVX support requires on Windows 7 SP1+. Maybe, ymm registers doesn't store correctly into context? https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134240(v=vs.85).aspx https://msdn.microsoft.com/ja-jp/library/windows/desktop/hh134233(v=vs.85).aspx
I wonder if they are using virtualization software.
As a side note, PCLMULQDQ should not depend on AVX/XSAVE support anyway.
The first comment in this post seems to indicate that a CPU check is insufficient for AVX detection, and that a runtime check with the OS is needed: https://connect.microsoft.com/VisualStudio/Feedback/Details/981479
Fixing it would probably be as simple as removing the "v" at the beginning of each instruction, and any references to YMM registers (at first glance there appears to be none).
(In reply to Makoto Kato [:m_kato] from comment #8) > At least, AVX support requires on Windows 7 SP1+. Maybe, ymm registers > doesn't store correctly into context? Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't explain the ones with Windows 7+.
I bet this is the same as bug 1096651. The freebl_cpuid function is not ensuring that ecx has meaningful values.
Confirmed that the original file generate instructions that include VEX prefix. Fixing it right now.
To clarify, the file involved is intel-gcm-x86-masm.asm from the NSS sources.
I also had to reduce the three operands instructions to two operands, often by adding extra movdqa instructions.
We've ran into a similiar issue before. You definitely need to check if the OS is saving the registers, otherwise you crash on older OS versions or you will crash. We've had old code that failed to do this. I don't think that's the problem now since our new code has the functions. Yuhong, can you supply a patch to the changes you made. My guess is we should probably just check the appropriate (VEX) bit. vpclmulqdq is pretty fundamental to this code (though I think we check specifically for this instruction). > Yes, Windows XP does not preserve AVX-related state, so that would explain the XP crashes but it doesn't > explain the ones with Windows 7+. Windows XP must be running old code, since we check the ymm bit in CR0 and it wouldn't even get into this code in a modern version. bob
I probably will just submit the new file, since the patch would be fairly large. Yes, I replaced vpclmulqdq with pclmulqdq (this would also make it work with pre-AVX CPUs that support it).
Created attachment 8743621 [details] intel-gcm-x86-masm.asm Not tested. Note the alignment requirements in the source.
Created attachment 8743630 [details] [diff] [review] bug1263495.patch I'm certainly not qualified to check this. All I did was add the file in attachment 8743621 [details], fix some whitespace up (trailing spaces, tabs to spaces, removed extra blank lines), and checked that our tests pass. I don't know if this causes a performance regression.
Attachment #8743630 - Flags: review?(rrelyea)
It might, but I think that is still better than crashing.
As a side note, it is possible to set a BCD option in Windows to disable XSAVE support.
Results from running performance tests on my Linux box (though I don't know how to interpret these): MODE INPUT SIZE (bytes) SYMMETRIC KEY SIZE (bits) REPETITIONS (cx/op) CONTEXT CREATION TIME (ms) OPERATION TIME (ms) Before: aes_gcm_c 0b 128 100T/0 265.000 0.000 0.000 0b aes_gcm_e 78Mb 128 0/10T 0.000 4226.000 4.226 18Mb aes_gcm_d 78Mb 128 0/10T 0.000 3539.000 3.540 22Mb aes_gcm_c 0b 256 100T/0 338.000 0.000 0.000 0b aes_gcm_e 78Mb 256 0/10T 0.000 4471.000 4.472 17Mb aes_gcm_d 78Mb 256 0/10T 0.000 3450.000 3.450 22Mb After: aes_gcm_c 0b 128 100T/0 265.000 0.000 0.000 0b aes_gcm_e 78Mb 128 0/10T 0.000 4221.000 4.221 18Mb aes_gcm_d 78Mb 128 0/10T 0.000 3433.000 3.433 22Mb aes_gcm_c 0b 256 100T/0 339.000 0.000 0.000 0b aes_gcm_e 78Mb 256 0/10T 0.000 4472.000 4.473 17Mb aes_gcm_d 78Mb 256 0/10T 0.000 3455.000 3.455 22Mb I'm not seeing cause for concern. I would like to get independent verification that the code remains constant-time though. A performance regression is one thing, but a timing side channel would be a deal-breaker. I see no reason that the changes would have added anything, but I'd like independent confirmation of that.
I think most of it would be AVX-SSE transition penalties, which I don't think NSS cares about though. The only additional instructions I have added are movdqa instructions.
Actually, the 128-bit decrypt looks to be significantly faster in comment 24. I ran the test several more times and it seems like the one above is an anomaly. Though it makes me more determined to see someone verify that these changes are constant time.
Note that I only changed the masm version, not gas.
Comment on attachment 8743630 [details] [diff] [review] bug1263495.patch Bah, after updating my entire Windows toolchain, which took hours, I ran these tests. I get a crash on line 141 of the modified file. That is: - vzeroupper ; AES-ENC(0) - vmovdqu T, XMMWORD PTR[KS] + movdqu T, XMMWORD PTR[KS] lea KS, [16 + KS] dec NR Lenc_loop: - vaesenc T, T, [KS] + aesenc T, [KS] <<<< here
Attachment #8743630 - Flags: review?(rrelyea) → feedback-
In light of this, and the size of the patch, could we simply port https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1096651&attachment=8520997 over as suggest in comment 14.
Yea, probably because the pointer is unaligned. AVX changed these kinds of instructions to allow unaligned pointers.
Note for testing: I think newer AMD processors do allow unaligned pointers even for normal SSE (non-AVX) instructions, so this kind of bug may go undetected.
Actually, I found out this new behaviour requires a bit in MXCSR to be set, otherwise it would continue to crash just like before.
Anyway, fortunately there seems to be a free register in this code, so if the KS pointer can't be aligned another movdqu can be used.
(In reply to Martin Thomson [:mt:] from comment #29) > In light of this, and the size of the patch, could we simply port > https://bugzilla.mozilla.org/page.cgi?id=splinter. > html&bug=1096651&attachment=8520997 over as suggest in comment 14. CPUID leaf 1 should return a value in ecx on any CPU that supports SSE3.
Anyway, I found https://bugzilla.mozilla.org/show_bug.cgi?id=1225094
That being said, it looks like the pushad and popad instructions are unneeded and make the disassembled code harder to read too.
Yes, I am talking about freebl_cpuid, specifically https://dxr.mozilla.org/mozilla-central/source/security/nss/lib/freebl/mpi/mpcpucache.c#115
Bug 1225094 is older, so let's dup this one to it.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1225094
Anyway, pclmulqdq seems to be commutative, which means that I can fix the alignment problem in the asm pretty easily.
Though it would be better if intel-gcm-warp.c used an aligned allocator.
Created attachment 8766663 [details] New version that fixes alignment problems and other bugs
Attachment #8743621 - Attachment is obsolete: true
Translate this version to other assemblers and the AVX-checking code can be removed.
Created attachment 8766665 [details] Another new version
Attachment #8766663 - Attachment is obsolete: true
Created attachment 8766666 [details] Fix movdqa to movdqu in LEndEnc7
Attachment #8766665 - Attachment is obsolete: true
Created attachment 8766675 [details] Fix pxor in LDecData7
Attachment #8766666 - Attachment is obsolete: true
See bug 1283585.
You need to log in before you can comment on or make changes to this bug.