Closed Bug 1780432 (CVE-2023-5388) Opened 2 years ago Closed 7 months ago

Timing attack against RSA decryption (in TLS)

Tracking

(firefox-esr115124+ fixed, firefox122 wontfix, firefox123 wontfix, firefox124+ fixed)

Status:

RESOLVED FIXED

Milestone:

3.98

Tracking Flags:

Tracking

Status

firefox-esr115

124+

fixed

firefox122

---

wontfix

firefox123

---

wontfix

firefox124

fixed

People

(Reporter: hkario, Assigned: anna.weine)

References

Details

(Keywords: csectype-side-channel, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main124+][adv-esr115.9+])

Attachments

(8 files)

conf_interval_plot_median.png 2 years ago Hubert Kario 63.27 KB, image/png		Details
report.csv 2 years ago Hubert Kario 8.63 KB, text/csv		Details
conf_interval_plot_median.png 2 years ago Hubert Kario 55.26 KB, image/png		Details
report.csv 2 years ago Hubert Kario 8.52 KB, text/csv		Details
ctmpi.tar.gz 2 years ago Hubert Kario 9.46 KB, application/gzip		Details
ctmpi-v2.tar.gz 2 years ago Hubert Kario 12.29 KB, application/gzip		Details
Bug 1780432 (CVE-2023-5388) Timing attack against RSA decryption (in TLS) 9 months ago Robert Relyea 48 bytes, text/x-phabricator-request		Details \| Review
advisory.txt 6 months ago Tyson Smith [:tsmith] 231 bytes, text/plain		Details

Hubert Kario

Reporter

Description

•

2 years ago

I've been working on testing against timing attacks using tlsfuzzer, the newest version of that code is in the upstream pregenerate branch[1].

What's special about it, is that it is much more sensitive to timing side-channels than both my previous attempts at it, from bug 1651411, and what the published research suggests. In particular, I'm able to detect the OpenSSL side-channel caused by the https://github.com/openssl/openssl/issues/6640 bug in just 10 thousand connections with 2048 bit RSA keys, over local gigabit Ethernet: the measured side channel is about 55ns while the test provides a result with a 95% confidence interval of ±36ns.

I haven't executed extensive tests against NSS, so I don't have good results with NSS just yet, but I've talked with Bob about the NSS code and how it handles multi-precision integers. The problem is similar as the one in OpenSSL.

The MPI objects internally represent large integers as a list of word-sized (64bit or 32bit) integers. The problem is, that many operations on objects perform "clamping" (if the most significant word is zero, they drop it and store the number in fewer words), in particular, the modulo multiplication performing the unblinding after the modular exponentiation in RSA private key decryption.

The problem is that when such number then needs to be converted to a byte string, so that it can be fed into a hash function, or so that the padding can be checked (be it PKCS#1v1.5 or OAEP), that operation can't take the same amount of time, since it operates on different number of words used to represent the number. In other words, that conversion then leaks if the high order bytes are zero or not: precisely the signal necessary for Bleichenbacher oracle.

I'll add detailed test results later.

Please keep this issue embargoed as other implementations are vulnerable, so we'd prefer to release information about it in a coordinated fashion.

1 - https://github.com/tlsfuzzer/tlsfuzzer/tree/pregenerate

Benjamin Beurdouche [:beurdouche]

Comment 1

•

2 years ago

Hi Hubert,
Thanks for the report.

MPI not being constant time is a known problem on our side and we know our RSA code is for sure vulnerable to side channels... : (
We will be looking at replacing the RSA code with a formally verified implementation in the next few weeks and we will report back when things are moving.

Here is a related bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1761036

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Comment 2

•

2 years ago

Right, but my point is that we can fix the timing side-channel in RSA by just fixing one multiply and mod operation, which I'd expect is much easier to do that replacing the whole implementation for RSA.

It is also much easier attack to perform than the micro-architectural side-channels or remote sound recording side-channels (where a side-channel free mod-exp or exponent blinding is necessary for protection)...

But sure, if you'll be able to replace it all in the next few weeks, it may not be worth it.

I'll still provide the test results, if nothing else, they'll be useful as the baseline to compare to with the new implementation. Also, if you'd like some timing tests done on the code before it is committed, just send me a line, I'll definitely find time to run them (this whole exercise started with us in Red Hat wanting to have an automated regression test case for timing attacks that we can run as part of CI, so it's not something that will be a one off anyway).

Anna Weine

Assignee

Updated

•

2 years ago

Assignee: nobody → nkulatova

Hubert Kario

Reporter

Comment 3

•

2 years ago

Attached image conf_interval_plot_median.png — Details

Plot of bootstrapped confidence intervals for median pairwise differences between the first (0th) probe and all the others. Results from 12.2M observations per sample (probe). 95% confidence interval is ±22.5ns.

Probe descriptions:

ID,Name
0,random plaintext - 1
1,random plaintext - 1 zero MSB
2,random plaintext - 2
3,random plaintext - 2 zero MSB
4,random plaintext - 4 zero MSB
5,random plaintext - 8 zero MSB
6,random plaintext - 16 zero MSB
7,random plaintext - 40 zero MSB

Random plaintext in this case refers to the data before RSA public key operation.
The "random plaintext - 1" and "random plaintext - 2" are just a copy of the same kind of probe, to verify that the execution environment isn't biased (for a "canary" test, expected to fail; have high p-value).
The "1 zero MSB" means that one of the Most Significant Bytes is equal to 0x00, "2 zero MSB" for two, etc.

Hubert Kario

Reporter

Comment 4

•

2 years ago

Attached file report.csv — Details

Comparison statistics between all the different samples. Samples are 12.2M observations large, executed against current NSS tip (as of last week), with 2048 bit RSA key on an 64 bit platform.

Most important is the "Sign test" column.

The results show a clear timing signal when the first word (8 bytes) or two of the first words (16 bytes) are zero. When compared to plaintext with all bytes non-zero, the sign test reports p-values of 4.86e-7 and 1.3e-8 respectively.

At the same time, the results from comparing the fully random plaintexts with plaintexts with 1, 2, or 4 MSBs set to zero don't show statistically significant difference between samples (p-values larger than 0.05).

This is consistent with the theory that either the calculation of the last MPI (unblinding) or conversion of it to a byte string is not constant time with regards to the number of zero bytes, but with the granularity of a single "limb" used internally in MPI.

What is a bit surprising, is that when 40 bytes are zero (i.e. five limbs are zero), there is almost no statistically significant difference (p-values of 0.11, 0.22, 0.02) between either the baseline probes or probes with just few bytes set to zero. I've got no good explanation for that. I'll keep running the tests, hopefully it turns out to be just a case of a weaker timing signal, not a lack of it.

Hubert Kario

Reporter

Comment 5

•

2 years ago

Attached image conf_interval_plot_median.png — Details

As expected, when I collected more data, the 40 zero MSB probe gave a statistically significant result.
While in the graph it looks like the "2 zero MSB" may be significant too, the actual p-values are not significant after applying Bonferroni correction.

This is for 33.5M observations per probe, 95% CI of ±11.5ns.

Hubert Kario

Reporter

Comment 6

•

2 years ago

Attached file report.csv — Details

Detailed results from the 33.5M run.

The p-value for sign test of "random plaintext - 1" (probe 0) and "random plaintext - 40 zero MSB" (probe 7) is 3.11e-7.
Which would suggest that there are two algorithms, one that runs faster while other that runs slower when most significant limbs (words) are equal to zero, but they have different scaling factor.

BugBot [:suhaib / :marco/ :calixte]

Comment 7

•

2 years ago

The severity field is not set for this bug.
:beurdouche, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bbeurdouche)

Anna Weine

Assignee

Comment 8

•

2 years ago

I will try to estimate the severity of the bug :)

Tyson Smith [:tsmith]

Comment 9

•

2 years ago

Setting ni? as a reminder :)

Flags: needinfo?(nkulatova)

Anna Weine

Assignee

Updated

•

2 years ago

Severity: -- → S2

Flags: needinfo?(nkulatova)

Priority: -- → P2

Hubert Kario

Reporter

Comment 10

•

2 years ago

Attached file ctmpi.tar.gz — Details

I've implemented a simple mul() and mod() operations in (largely) portable pure C, and verified that they are indeed constant time on x86_64, aarch64, ppc64le, and s390x.

Included is both the code and the test harnesses for timing tests.

@nkulatova if you think they would be useful to integrate ahead of replacing the whole RSA code, feel free to use them.

Anna Weine

Assignee

Comment 11

•

2 years ago

(In reply to Hubert Kario from comment #10)

Created attachment 9297296 [details]
ctmpi.tar.gz

I've implemented a simple mul() and mod() operations in (largely) portable pure C, and verified that they are indeed constant time on x86_64, aarch64, ppc64le, and s390x.

Included is both the code and the test harnesses for timing tests.

@nkulatova if you think they would be useful to integrate ahead of replacing the whole RSA code, feel free to use them.

Did you have a chance to test the speed?

Anna Weine

Assignee

Comment 12

•

2 years ago

The simplest way to measure the speed would be to use the code I wrote here: https://phabricator.services.mozilla.com/D153347

Hubert Kario

Reporter

Comment 13

•

2 years ago

Those are basically the slowest possible ways to implement those operations (long multiplication on words and long division on bits), so they are slow.

On x86_64 a multiplication of two 2048bit numbers takes about 4.2µs on a 3.9GHz CPU (i7-8650U), so not bad.
But the mod() operation of a 4096 bit operand (result of 2048 bit number multiplication) and 2048 bit modulus is very slow, taking 580µs on a 5.2GHz CPU (i9-12900KS).

Side note: using rdtsc alone is not sufficient on Intel CPUs for microbenchmarking, as that instruction is not a speculative execution or memory access barrier, see https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf or get_time_before() and get_time_after() in harness.c in ctmpi.tar.gz.

Hubert Kario

Reporter

Comment 14

•

2 years ago

Attached file ctmpi-v2.tar.gz — Details

I've got sort-of good news. I went ahead and implemented the multi-word version of Montgomery reduction. It's about as fast as the multiplication, taking around 2.9µs on a 3.9GHz CPU (i7-8650U), so quite usable.

The downside is that it assumes that it has access to the multiplicative inverse of the modulus mod word size (i.e. 2^64 or 2^32). And it obviously requires the mod_exp() output or the unblinder to be in the Montgomery form (they both can be in Montgomery form, then Montgomery reduction needs to be applied twice to the output of the mul(), but that's supported too, the function I've implemented can handle any input up to the double the number of words of the modulus).

Anna Weine

Assignee

Comment 15

•

2 years ago

Could you compare the performance with the current one?

Also, just fyi, the HACL* team implemented the MM/MR for RSA, if you want to take a look for inspiration, here it is: https://github.com/hacl-star/hacl-star/blob/main/dist/gcc-compatible/ (Bignum or RSA).

Finally, do you think if we implement an euclidian division to compute the multiplicative inverse, will it is quicker?

Hubert Kario

Reporter

Comment 16

•

2 years ago

Could you compare the performance with the current one?

I don't have precise measurements, but my experience is that NSS performs the RSA key exchange in about 1 milisecond, so even if we assume that the current unblinding and final Montgomery reduction takes no time, this new code would slow down NSS by less than 1%.

Also, just fyi, the HACL* team implemented the MM/MR for RSA, if you want to take a look for inspiration,

well, I've done it mostly as a proof that it's possible to have side-channel secure pure C code (or that it's possible to have side-channel free multiplication of integers in the first place), I'm not interested in implementing advanced numerical methods... For one, while a neat exercise, I still think those algorithms should be implemented in assembly (e.g.: I wouldn't be surprised if add() couldn't be made like 3 times faster in assembly than what gcc generates from my code).

Finally, do you think if we implement an euclidian division to compute the multiplicative inverse, will it is quicker?

To calculate what? The inverse of modulus mod word, or to calculate the Montgomery reduction?

For inverse, yes, AFAIK, that's the fastest algorithm, and fine to use since we don't have to worry about side-channels: the operands are public.

For the Montgomery reduction: I don't think so, the multi-word reduction operates on much smaller numbers and less data at every step. If you have the full-sized inverse of the modulus you still need to multiply it, and as you can see, my mul() is slower than mod_montgomery() (4.2µs vs 2.9µs). Now, I'm using long multiplication instead of Karatsuba or better, but then for regular Montgomery reduction you have to do two multiplications and with much bigger numbers, not one, so I don't think you'd be able to beat multi-word Montgomery reduction, as it multiplies a modulus-sized multi-precision integer by a single word at a time: there isn't a better algorithm for that than the one I already implemented, multiplying every word in turn, neither Karatsuba nor FFTs can speed it up.

But like I said, I'm not particularly interested in numerical methods, so don't consider it to be an expert opinion, I may very well be wrong. The behaviour may also be more complex, where one algorithm is faster for smaller key sizes and another is faster for larger key sizes.

Anna Weine

Assignee

Comment 17

•

2 years ago

•

Edited

To calculate what? The inverse of modulus mod word, or to calculate the Montgomery reduction?
The inverse it was.

(In reply to Hubert Kario from comment #10)

Created attachment 9297296 [details]
ctmpi.tar.gz

I've implemented a simple mul() and mod() operations in (largely) portable pure C, and verified that they are indeed constant time on x86_64, aarch64, ppc64le, and s390x.

Included is both the code and the test harnesses for timing tests.

@nkulatova if you think they would be useful to integrate ahead of replacing the whole RSA code, feel free to use them.

Would you prefer to submit the patch by yourself, or would you prefer if I do it?

Hubert Kario

Reporter

Comment 18

•

2 years ago

I'd prefer if You'd do it.

Anna Weine

Assignee

Comment 19

•

2 years ago

Ok! Do you mind then to review it once it's ready?

Hubert Kario

Reporter

Comment 20

•

2 years ago

Sure, I can both review and test it for timing side-channels.

Anna Weine

Assignee

Comment 21

•

2 years ago

Lovely, thanks.

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Keywords: csectype-disclosure, sec-moderate

Hubert Kario

Reporter

Comment 22

•

2 years ago

@nulatova: Hi,
How is the progress on getting in the constant time HACL implementation of RSA into NSS?
I'm asking because the other libraries with similar issues are planning a release around the end of January, so that may generate increased interest in this kind of vulnerabilities.

Hubert Kario

Reporter

Updated

•

2 years ago

Flags: needinfo?(nkulatova)

Anna Weine

Assignee

Comment 23

•

2 years ago

Hi,

We are very close to adding the RSA-PSS HACL* support to NSS.
Do you consider that the rest of the RSA primitives require the immediate attention?

Flags: needinfo?(nkulatova)

Hubert Kario

Reporter

Comment 24

•

2 years ago

Do you consider that the rest of the RSA primitives require the immediate attention?

Yes,

The unblinding leaking information affects all RSA private key operations (though exploitation of it may be very non-tryvial
The PKCS#1 v1.5 decryption is exploitable over the network

Anna Weine

Assignee

Comment 25

•

2 years ago

•

Edited

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → FIXED

Anna Weine

Assignee

Updated

•

2 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Anna Weine

Assignee

Comment 26

•

2 years ago

If I understand correctly, RSA#1 v1.5 is not used in TLS1.3. However, we plan to get rid of the non-constant time code as soon as we can.

Daniel Veditz [:dveditz]

Comment 27

•

2 years ago

We still have to support TLS 1.2 servers. I don't know if there's a deprecation timeline and kill-date for TLS 1.2, but judging how TLS 1.0/1.1 went that could be a while still.

Keywords: csectype-disclosure → csectype-side-channel

Hubert Kario

Reporter

Comment 28

•

2 years ago

Yes, this bug affects only TLS 1.2 and earlier. TLS 1.3 is immune.

Hubert Kario

Reporter

Comment 29

•

2 years ago

Ah, but it also affects any applications that would use NSS API directly for RSA decryption (be it PKCS#1 v1.5 or OEAP).

Anna Weine

Assignee

Comment 30

•

2 years ago

Yes, this we take into account

Hubert Kario

Reporter

Comment 31

•

2 years ago

Side note: the same issue in OpenSSL, CVE-2022-4304, was made public today.

Benjamin Beurdouche [:beurdouche]

Updated

•

2 years ago

Flags: needinfo?(bbeurdouche) → needinfo?(nkulatova)

Hubert Kario

Reporter

Comment 32

•

1 year ago

The paper was accepted. I'll be making the issue public in last week of September (between 25-29th).

Hubert Kario

Reporter

Comment 33

•

1 year ago

As I've said, the issue was made public, the details of it are on the https://people.redhat.com/~hkario/marvin/

Can we get CVE assigned specifically for the numerical library side-channel leakage? It affects both RSA key exchange and the RSA-OAEP in WebCrypto, so technically it's a bug in Firefox too.

Hubert Kario

Reporter

Comment 34

•

1 year ago

Anna, could you assign a CVE to it? It would make fix coordination much easier.

Robert Relyea

Comment 35

•

1 year ago

What is the process for getting a CVE for this, and is there anything I can help to move this forward?

Benjamin Beurdouche [:beurdouche]

Comment 36

•

1 year ago

Dan and Tom can assign CVEs for this.

Flags: needinfo?(tom)

Flags: needinfo?(dveditz)

Tom Ritter [:tjr]

Updated

•

1 year ago

Alias: CVE-2023-5388

Flags: needinfo?(tom)

Flags: needinfo?(dveditz)

Robert Relyea

Comment 37

•

9 months ago

Attached file Bug 1780432 (CVE-2023-5388) Timing attack against RSA decryption (in TLS) — Details

Add Constant time mult mod functions.
a. constant time mul
b. use constant time montgomery reduce.
Use montgomery values for blinding.

Robert Relyea

Comment 38

•

9 months ago

Anna, I've pushed our downstream patch for this to phabricator. We've tested this patch against the marvin toolkit and our test suite on x86, ppc, and s390 and we've removed the timing vulnerability. I'll let you decide how to proceed with the patch.
Thanks,
bob

Robert Relyea

Comment 39

•

8 months ago

Ping, Anna how do you want to proceed with the patch for this issue (see comment 37 and comment 38). Thanks.

Anna Weine

Assignee

Comment 40

•

8 months ago

Oh great! We will take a look at the patch! (Could you the next time put nss-reviewers as the reviewer option? As I have just found your question, sorry for the delay!)

Flags: needinfo?(nkulatova)

John Schanck [:jschanck]

Comment 41

•

7 months ago

https://hg.mozilla.org/projects/nss/rev/b090a1e5dcdfc5772671063cfe9ebeadabd29ad3

Status: REOPENED → RESOLVED

Closed: 2 years ago → 7 months ago

Resolution: --- → FIXED

John Schanck [:jschanck]

Updated

•

7 months ago

Target Milestone: --- → 3.98

Ryan VanderMeulen [:RyanVM]

Updated

•

7 months ago

Group: crypto-core-security → core-security-release

status-firefox122: --- → wontfix

status-firefox123: --- → wontfix

status-firefox124: --- → fixed

status-firefox-esr115: --- → affected

Hubert Kario

Reporter

Comment 42

•

7 months ago

I'd like to add the information to the Marvin page that there are Firefox versions with the fixes, but I see that 124 is still at alpha/nightly stage, similarly NSS 3.98 isn't released. Should I put that information there ahead of time, or should I wait till final versions are released?

John Schanck [:jschanck]

Updated

•

7 months ago

Blocks: 1880562

John Schanck [:jschanck]

Comment 43

•

7 months ago

NSS 3.98 and NSS 3.90.2 (ESR) have just been released with this patch. Those versions of NSS will be in Firefox 124 and Firefox 115.9, respectively, which will both be released on March 19.

Ryan VanderMeulen [:RyanVM]

Updated

•

7 months ago

tracking-firefox124: --- → +

tracking-firefox-esr115: --- → 124+

Andrei Vaida [:avaida]

Updated

•

7 months ago

Flags: qe-verify-

Whiteboard: [post-critsmash-triage]

Dianna Smith [:diannaS]

Comment 44

•

7 months ago

marking esr115 as fixed for 115.9esr by 1880562 which was just uplifted

status-firefox-esr115: affected → fixed

Tyson Smith [:tsmith]

Updated

•

6 months ago

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main124+][adv-esr124+]

Tyson Smith [:tsmith]

Updated

•

6 months ago

Whiteboard: [post-critsmash-triage][adv-main124+][adv-esr124+] → [post-critsmash-triage][adv-main124+][adv-esr115.9+]

Tyson Smith [:tsmith]

Comment 45

•

6 months ago

Attached file advisory.txt — Details

Daniel Veditz [:dveditz]

Updated

•

5 days ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.