Crash in intel_aes_gcmINIT

RESOLVED FIXED in 3.15.4

Status

P1
major
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: wtc, Assigned: wtc)

Tracking

(Depends on: 1 bug, {crash})

trunk
3.15.4
x86_64
Linux
crash
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Assignee)

Description

5 years ago
Google Chrome 31 was recently released. It supports the AES-GCM
cipher suites for TLS.

We have received two bug reports of crashes due to SIGILL
(illegal instruction) on Linux.

http://code.google.com/p/chromium/issues/detail?id=320524
http://code.google.com/p/chromium/issues/detail?id=321282

The stack trace from the first bug report has the symbols:

Core was generated by `/opt/google/chrome/chrome       '.
Program terminated with signal 4, Illegal instruction.
#0  0x00007fd1d2c77a60 in intel_aes_gcmINIT () from /lib64/libfreebl3.so
Missing separate debuginfos, use: debuginfo-install google-chrome-stable-31.0.1650.57-1.x86_64
(gdb) bt
#0  0x00007fd1d2c77a60 in intel_aes_gcmINIT () from /lib64/libfreebl3.so
#1  0x00007fd1d2c743f8 in intel_AES_GCM_CreateContext ()
   from /lib64/libfreebl3.so
#2  0x00007fd1d2c3a672 in AES_InitContext () from /lib64/libfreebl3.so
#3  0x00007fd1d2c3aad3 in AES_CreateContext () from /lib64/libfreebl3.so
#4  0x00007fd1c554054f in sftk_CryptInit () from /lib64/libsoftokn3.so
#5  0x00007fd1c554146a in NSC_EncryptInit () from /lib64/libsoftokn3.so
#6  0x00007fd1dda1b3a0 in PK11_Encrypt (symKey=0xca1a8a03380, 
    mechanism=<optimized out>, param=<optimized out>, out=
    0xca1a930000d "\345\066\372\066\n\201O\305?_\261\227\263Vw\240\310\237A\342\256\261\301\340", outLen=0x7fd1b8275b0c, maxLen=<optimized out>, data=
    0xca1a93fa000 "\024", dataLen=16) at pk11obj.c:895

Shay, do you know what might be wrong? Thanks.

My Linux computer doesn't have AVX, so I can't debug this easily.

Comment 1

5 years ago
There is not enough information to determine exactly, and I could not reproduce this problem on a Linux machine. But how about this: it might be related to the OS. 

From the CPUINFO, the only difference I could see  - between a working case and the reported problem - is that the reporting system did not have the “xsave” flag. 

CPUID.OSXSAVE is the reflection of the CPU having CR4.osxsave set. If the OS does not set CR4.OSXSAVE, then CPUID will report that OSXSAVE is not there, and then a user/app could not execute the AVX path.

The OS sets CR4.OSXSAVE - the application cannot do that. 
(and Ivy bridge CPU should be running on an XSAVE OS. But it is not clear from the report – what the OS is actually doing) 

If the OS does not enable OSXSAVE, you cannot issue XGETBV and cannot use AVX. 

Finally, either enable the “xsave” or add (to the code path selector) a condition on the “xsave”.
(In reply to Shay Gueron from comment #1)
> CPUID.OSXSAVE is the reflection of the CPU having CR4.osxsave set. If the OS
> does not set CR4.OSXSAVE, then CPUID will report that OSXSAVE is not there,
> and then a user/app could not execute the AVX path.

See http://lists.xen.org/archives/html/xen-devel/2012-09/msg00491.html. Is it possible that the users reporting this are running under Xen or in some other kind of VM?

Comment 3

5 years ago
Oh, we were seeing this on RHEL-5. There needs to be some OS support in order to use AVX. For us, we simply built NSS without AVX support on RHEL-5. If you supply your own NSS, then you may need to determine what OS you are running and turn it off at runtime.
(Assignee)

Comment 4

5 years ago
Shay: thank you for looking at this. I also found two or three Intel articles
for developers that mention the need to check the CPUID.OSXSAVE bit and execute
the XGETBV instruction to detect OS support of AVX.

For your code in intel-gcm.s, is it enough to just check the CPUID.OSXSAVE bit
(a simple change), or do we also need to execute the XGETBV instruction to verify
the OS is saving the YMM registers?

Updated

5 years ago
Attachment #8335494 - Flags: superreview?(agl) → superreview+

Comment 6

5 years ago
Comment on attachment 8335494 [details] [diff] [review]
Patch

r+ rrelyea
Attachment #8335494 - Flags: review?(shay.gueron) → review+

Comment 7

5 years ago
Elio, it seems like we are missing the redhat code that allows you to just turn off GCM acceleration and leave just regular AES acceleration. Please make sure there's a mozilla bug to get our code upstream. If there already is please include the bug number here.

Thanks.

bob

Comment 8

5 years ago
As I mentioned on the Chromium review for detecting when to disable this, would it be fair to bump the softoken revision so that we can detect (on the Chromium side) whether or not we're using a bugged softoken?
(Assignee)

Comment 9

5 years ago
Comment on attachment 8335494 [details] [diff] [review]
Patch

Review of attachment 8335494 [details] [diff] [review]:
-----------------------------------------------------------------

Shay: could you review and test this patch? Thanks.

Patch checked in: https://hg.mozilla.org/projects/nss/rev/edda2ba82d22
Attachment #8335494 - Flags: review?(shay.gueron)
Attachment #8335494 - Flags: checked-in+

Comment 10

5 years ago
Comment on attachment 8335494 [details] [diff] [review]
Patch

Review of attachment 8335494 [details] [diff] [review]:
-----------------------------------------------------------------

It is OK (at least on Linux).
 
Comment: please note that the use of AES & PCLMULQDQ instructions does not require OS support for AVX - only using their AVX version does.

Comment 11

5 years ago
(In reply to Robert Relyea from comment #7)
> Elio, it seems like we are missing the redhat code that allows you to just
> turn off GCM acceleration and leave just regular AES acceleration. Please
> make sure there's a mozilla bug to get our code upstream. If there already
> is please include the bug number here.

I have entered Bug 941690.
(Assignee)

Comment 12

5 years ago
(In reply to Shay Gueron from comment #10)
>
> Comment: please note that the use of AES & PCLMULQDQ instructions does not
> require OS support for AVX - only using their AVX version does.

Thank you. You're not suggesting that I add this comment to
rijndale.c, are you?

Our check for CPUID.OSXSAVE and the result of XGETBV only
applies to your assembly code in intel-gcm.s.
Severity: normal → major
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Keywords: crash
Resolution: --- → FIXED
Target Milestone: --- → 3.15.4
(Assignee)

Updated

5 years ago
Attachment #8335494 - Flags: review?(shay.gueron)
This code has been reported to fail with
  rijndael.c:981: Error: no such instruction: `xgetbv'
on a Linux VM.

Please see bug 976631
Depends on: 976631
You need to log in before you can comment on or make changes to this bug.