Closed Bug 1651411 (CVE-2023-4421) Opened 4 years ago Closed 7 months ago

New tlsfuzzer code can still detect timing issues in RSA operations.

Categories

(NSS :: Libraries, defect, P3)

3.53

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rrelyea, Assigned: rrelyea)

Details

(Keywords: sec-other, Whiteboard: [embargo - coordinate with RedHat / rrelyea@redhat.com for disclosure] [nss-fx])

Attachments

(9 files, 1 obsolete file)

We have a new iteration of tlsfuzzer that does timing calculations against various malformed RSA operations which are used in Bleichenbacher style oracle attacks against RSA ssl cipher suites.

I have a patch that flattens this timing considerably. It hits almost all layers of the NSS stack: ssl, nss, softokn, and freebl.

The attack is only valid on the server side of an SSL RSA connection.

The new tlsfuzzer script has not been released yet.

Because of the extensiveness of the fix, redhat will not pick this up in it's latest RHEL release without some upstream testing, so we don't need this bug to be security sensitive, but I marked it as such in case mozilla wanted to handle the path as a embargoed fix ntil the new tlsfuzzer script is available.

We have a new iteration of tlsfuzzer that does timing calculations against various malformed RSA operations which are used in Bleichenbacher style oracle attacks against RSA ssl cipher suites.

This code adds a new C_Decrypt mechanism for Constant time RSA and a new
key type for Constant time. If these are set, errors are no longer returned,
but the underlying key is transformed into a standard TLS PMS key with invalid
version and random data. When errors are handled they are handled in a constant time way.

NOTE: This will not properly handle HSM's in a constant time way unless HSM's also never return errors on invalid padding, etc.

"sec-other" for now and I'll let JC handle the question about embargo in comment 0, or unhide the bug depending on answer.

Flags: needinfo?(jjones)
Keywords: sec-other

It sounds reasonable to me to keep this confidential while the new tlsfuzzer matures, just in case there are other classes of issue to handle.

Flags: needinfo?(jjones)
Whiteboard: [embargo - coordinate with RedHat / rrelyea@redhat.com for disclosure]
Assignee: nobody → rrelyea
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P3
Attached file report.csv

I've run the tlsfuzzer tests against 2nd revision in D82742 and I can see that processing of the probe with multiple zero most significant bytes ("very short PKCS padding (40 bytes short)") takes significantly longer. Compared to "very long (124-byte) pre master secret" it takes 130ns longer, with a 95% CI of 10ns (sample size: 1M).

Results from comparing individual probes to each-other are in the attachment.

95% CI of +-10ns (exact interval: -1.40290e-07s to -1.20376e-07s)

Attached image conf_interval_plot.png

And a graphical representation of differences between timing of different probes, same run as in comment #4

Legend for the graph:

ID,Name
0,fuzzed pre master secret
1,invalid MAC in Finished on pos 0
2,invalid MAC in Finished on pos -1
3,invalid padding_length in Finished
4,invalid version number in padding
5,no encrypted value
6,no null separator in encrypted value
7,no null separator in padding
8,one byte encrypted value
9,set PKCS#1 padding type to 1
10,set PKCS#1 padding type to 3
11,too long (49-byte) pre master secret
12,too long PKCS padding
13,too short (47-byte) pre master secret
14,too short PKCS padding
15,two byte long PMS (TLS version only)
16,very long (96-byte) pre master secret
17,very long (124-byte) pre master secret
18,very short (4-byte) pre master secret
19,very short PKCS padding (40 bytes short)
20,"wrong TLS version (0, 0) in pre master secret"
21,"wrong TLS version (2, 2) in pre master secret"
22,zero byte in first byte of random padding
23,zero byte in last byte of random padding
24,zero byte in random padding

This patch defeats Bleichenbacher by not trying to hide the size of the
decrypted text, but to hide if the text succeeded for failed. This is done
by generating a fake returned text that's based on the key and the cipher text,
so the fake data is always the same for the same key and cipher text. Both the
length and the plain text are generated with a prf.

Here's the proposed spec the patch codes to:

  1. Use SHA-256 to hash the private exponent encoded as a big-endian integer
    to a string the same length as the public modulus. Keep this value secret.
    (this is just an optimisation so that the implementation doesn't have to
    serialise the key over and over again)
  2. Check the length of input according to step one of
    https://tools.ietf.org/html/rfc8017#section-7.2.2
  3. When provided with a ciphertext, use SHA-256 HMAC(key=hash_from_step1,
    text=ciphertext) to generate the key derivation key
  4. Use SHA-256 HMAC with key derivation key as the key and a two-byte big-endian
    iterator concatenated with byte string "length" with the big-endian
    representation of 2048 (0x0800) as the bit length of the generated string.
  • Iterate this PRF 8 times to generate a 256 byte string
  1. initialise the length of synthetic message to 0
  2. split the PRF output into 2 byte strings, convert into big-endian integers,
    zero-out high-order bits so that they have the same bit length as the octet
    length of the maximum acceptable message size (k-11), select the last integer
    that is no larger than (k-11) or remain at 0 if no integer is smaller than
    (k-11); this selection needs to be performed using a side-channel free
    operators
  3. Use SHA-256 HMAC with key derivation key as the key and a two-byte big-endian
    iterator concatenated with byte string "message" with the big-endian
    representation of k*8
  • use this PRF to generate k bytes of output (right-truncate last HMAC
    call if the number of generated bytes is not a multiple of SHA-256 output
    size)
  1. perform the RSA decryption as described in step 2 of section 7.2.2 of rfc8017
  2. Verify the EM message padding as described in step 3 of section 7.2.2 of
    rfc8017, but instead of outputting "decryption error", return the last l
    bytes of the "message" PRF, when l is the selected synthetic message length
    using the "length" PRF, make this decision and copy using side-channel free
    operation

r+ on the patch, will provide test results later

I've executed the test after changes from comment #7 and unfortunately I can't say the issue is fixed, the case that uses PKCS#1 v1.5 type 0 padding (i.e. no padding, just 48 random bytes encrypted) is an outlier. Given that the "very short PKCS#1 padding (40 bytes short)" is not an outlier, it's probably not exploitable, at least not with this simple analysis method I used.

This is from a run with 1M observations per sample, 2048bit RSA key, numerically the 95% confidence interval is about ±126ns. Please note: I do know that the measurements tend to be autocorrelated, which makes the error bars larger in reality than those bootstrapped values would suggest, that's also the likely reason for the variability between other samples and runs (see the other run in the comment below).

I'll do two more runs with 1024 bit keys, that should make the error bars smaller.

Legend:
0,invalid PKCS#1 type (0) in padding
1,invalid PKCS#1 type (1) in padding
2,invalid PKCS#1 type (3) in padding
3,invalid version number (1) in padding
4,no null separator
5,random plaintext
6,too long PKCS#1 padding
7,too short PKCS#1 padding
8,use 0 as padding byte
9,use 1 as the padding byte (low Hamming weight plaintext)
10,use PKCS#1 type 0 padding
11,use PKCS#1 type 1 padding
12,very short PKCS#1 padding (40 bytes short)
13,well formed
14,well formed with empty synthethic PMS
15,well formed with very long synthethic PMS
16,zero byte in eight byte of padding
17,zero byte in first byte of padding
18,zero byte in second byte of padding
19,zero byte in third byte of padding

second run, 2048bit RSA, 1M observations per sample, 95% CI ±124ns

Results from a run with 1024 bit RSA key. 1M observations per sample, 95% CI of ±50.5ns.

So it looks like the strength of signal is proportional to the key size—expected, given that the size of pre master size secret doesn't change, while the key size does, so the fraction of zero bytes in the decrypted value increases.

A second run with 1024 bit RSA key, also 1M observations per sample, 95% CI of ±53.0ns.

Whiteboard: [embargo - coordinate with RedHat / rrelyea@redhat.com for disclosure] → [embargo - coordinate with RedHat / rrelyea@redhat.com for disclosure] [nss-fx]

I've got some good news, I've created a tlsfuzzer test case that expects the behaviour with the workaroung from D99843 present and executed it against nss-3.81.0-1.fc36.x86_64 with 2048 bit RSA key on a 5.2GHz cpu. I collected 127 million connections per probe and got a 95% CI for 5% trimmed mean of ±0.9ns (so about 4.6 cycles).

The results clearly show that there's a problem with the maths library, with it taking significantly different amounts of time when the number of zero MSBs is bigger than 8, or when there is very low or very high Hamming weight of the encrypted value. But that's handled in the bug 1780432.

With this results I think we've fixed the leak that comes from the PKCS#11 interface.

The legend for the graph:
ID,Name
0,invalid PKCS#1 type (0) in padding
1,invalid PKCS#1 type (1) in padding
2,invalid PKCS#1 type (3) in padding
3,invalid version number (1) in padding
4,no null separator
5,random plaintext
6,too long PKCS#1 padding
7,too short PKCS#1 padding
8,use 0 as padding byte
9,use 1 as the padding byte (low Hamming weight plaintext)
10,use PKCS#1 type 0 padding
11,use PKCS#1 type 1 padding
12,very short PKCS#1 padding (40 bytes short)
13,well formed - 1
14,well formed - 2
15,well formed - 3
16,well formed with empty synthethic PMS
17,well formed with very long synthethic PMS
18,zero byte in eight byte of padding
19,zero byte in first byte of padding
20,zero byte in second byte of padding
21,zero byte in third byte of padding

Attached file report.csv

From the detailed result, it looks like the "random plaintext" is also an outlier, but given the high p-values for two of the three "canary" probes (the "well formed" 1 to 3), and the fact that the p-values of it stayed as such since probes with 10 million observations, I'm thinking that it's a fluke rather than a significant result.

I've spent already over 6 weeks of machine time on this, so I don't think it's reasonable to spend more given that the bug 1780432 is unfixed (and can have effect on the "random plaintext" probe too).

Since I'll be talking about this publicly next month, and the patch in https://phabricator.services.mozilla.com/D99843 both improves the situation markedly and is merged, it would be good to have a CVE assigned for this issue.

Alias: CVE-2023-4421

@Tom
Thanks! Do you need help with the CVE description?

also I should have put a more specific date on the release: I'll be making the issue public in the last week of September (between 25-29th)

Since everything about this bug is now public, the fixes shipped, I think we can close it.
Though I don't know if the CVE itself shouldn't be announced/made public before that.

Flags: needinfo?(rrelyea)

beurdouche, can we move this bug to public now?

Flags: needinfo?(rrelyea) → needinfo?(bbeurdouche)

This is fine with me. @Dan ?

Flags: needinfo?(bbeurdouche) → needinfo?(dveditz)
Group: crypto-core-security
Flags: needinfo?(dveditz)
Comment 0 is private: false

resolving bug per comment 18

Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED

The CVE for this is still marked reserved in the mitre database? Is there something preventing it from being pushed now?

Flags: needinfo?(dveditz)

Where is the advisory for this bug? We need a public link as part of the information we submit with the CVE. Since this wasn't an issue for Firefox we don't have one of our own advisories to link to.

Flags: needinfo?(tom)
Flags: needinfo?(rrelyea)
Flags: needinfo?(dveditz)

Please let me know (with a NI) the NSS version this was fixed in and a blurb I can use, and I will make one of these: https://www.mozilla.org/en-US/security/known-vulnerabilities/nss/

Flags: needinfo?(tom)

nss 3.61

Flags: needinfo?(rrelyea)

Tom, is there anything else you need from us?

Flags: needinfo?(tom)

Is the following blurb accurate:

Timing differences in RSA Decryption
Robert Relyea

Internal PKCS#1 v 1.5 padding operations could have leaked information about an encrypted message via a Bleichenbacher-style attack.

Additionally, this is currently classified as 'sec-other' - it will need a security rating.

Flags: needinfo?(tom)

It should say 'Hubert Kario', Hubert what should the classified rating be?

Flags: needinfo?(hkario)

For CVE-2023-4421 I'd propose classifying it as Moderate (that's the general severity assigned to this class of issues, see [1] and [2]) and a description something like this:

Timing side-channel in PKCS#1 v1.5 decryption depadding code
Hubert Kario

The NSS code used for checking PKCS#1 v1.5 was leaking information useful in mounting Bleichenbacher-like attacks.
Both the overall correctness of the padding as well as the length of the encrypted message was leaking through timing side-channel.
By sending large number of attacker-selected ciphertexts, the attacker would be able to decrypt a previously intercepted PKCS#1 v1.5 ciphertext (for example, to decrypt a TLS session that used RSA key exchange), or forge a signature using the victim's key.
The issue was fixed by implementing the implicit rejection algorithm, in which the NSS returns a deterministic random message in case invalid padding is detected, as proposed in the Marvin Attack paper. 

Few explanations about the particulars of the proposed phrasing:

  1. This bug, and the CVE, is specific for the leakage in the depadding code, while I also described above the leakage from numerical library, that's handled in CVE-2023-5388
  2. The padding operations didn't "could have" leaked information, they definitely did leak information, and the size of the leakage would make any attack rather easy (I'd estimate it at less than 24h for a local network attack against a 2048 bit key)
  3. And yes, I found it

1 - https://people.redhat.com/~hkario/marvin/
2 - https://robotattack.org/

Flags: needinfo?(hkario)

Okay, I went to make the advisory for this, and then realized that this was fixed in a release made 2021-01-22 - so I'm going to get a new CVE from 2021 and release an advisory with that text, a date of Dec 2023, and add a note indicating that this issue was fixed in 2021 and embargoed until 2023 - if that sounds alright?

Flags: needinfo?(rrelyea)

Please don't assign a new CVE: The CVE-2023-4421 is already public[1,2,3], and
there were already actions (or non-actions) made based on it.
Making it two CVEs instead of one will just make it more confusing, not less.

The issue is just that it's still not public on MITRE stide.

[1] - https://people.redhat.com/~hkario/marvin/#patches
[2] - https://bugzilla.redhat.com/show_bug.cgi?id=CVE-2023-4421
[3] - https://www.suse.com/security/cve/CVE-2023-4421.html

Flags: needinfo?(rrelyea)

This bug, and the CVE, is specific for the leakage in the depadding code, while I also described above the leakage from numerical library, that's handled in CVE-2023-5388

Is there an open bug for CVE-2023-5388? Bugzilla tells me Zarro Boogs found, but that could just be because I don't have access to that bug.

(In reply to Mike Hommey [:glandium] [OOO Dec 30-Jan 8] from comment #33)

This bug, and the CVE, is specific for the leakage in the depadding code, while I also described above the leakage from numerical library, that's handled in CVE-2023-5388

Is there an open bug for CVE-2023-5388? Bugzilla tells me Zarro Boogs found, but that could just be because I don't have access to that bug.

Yes, I added you.

Attachment #9162208 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: