certificate transparency signature verifications negatively impact TLS handshake performance

RESOLVED FIXED in Firefox -esr52

Status

()

Core
Security: PSM
P1
normal
RESOLVED FIXED
5 months ago
4 months ago

People

(Reporter: Alice0775 White, Assigned: keeler)

Tracking

(Blocks: 1 bug, {perf, regression})

52 Branch
mozilla55
x86
Windows 10
perf, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 wontfix, firefox-esr5253+ fixed, firefox53+ fixed, firefox54+ fixed, firefox55+ fixed)

Details

(Whiteboard: [STR in comment#6])

Attachments

(6 attachments, 2 obsolete attachments)

(Reporter)

Description

5 months ago
Created attachment 8854257 [details]
about:suppot

Disabling e10s does not solve in Nightly54.0a1.

Reproducible: always

Steps To Reproduce:
1. Start with clean profile, and disable Flash Player if any
2. Load https://www.yahoo.co.jp/ in several times

Actual Results:
In Nightly 55.0a1, page loading is x2 slower than Firefox 51.0.1
Elapsed time until tab-throbber stops:
Firefox 51.0.1(without e10s) :  7-10sec
Nightly 55.0a1(with/without e10s) : 20-25sec

Expected Results:
Not so.

My environment:
Windows10 Home 1607 64bit, Core2Quad 4@2.5GHz, Mem 8GB, Net ADSL 12(actual: down 9.5Mbps, up 800Kbps)
Comment hidden (obsolete)
Comment hidden (obsolete)
Comment hidden (obsolete)
Comment hidden (obsolete)
(Reporter)

Updated

5 months ago
Blocks: 1353289
Comment hidden (obsolete)
(Reporter)

Comment 6

5 months ago
In Nightly 55.0a1, page loading is x2 slower than Firefox 51.0.1

Reproducible: almost always

Steps To reproduce:
1. Disable e10s
2. Open https://tv.yahoo.co.jp/listings/fm/
   (maybe Japan region only)
3. Wait to render page and settle CPU
4. Reload page (F5)
   --- Observe the time until tab-throbber stops spinning
5. Reload page (F5)
   --- Observe the time until tab-throbber stops spinning
6. Reload page (F5)
   --- Observe the time until tab-throbber stops spinning

Actual Results:
It takes about 4-6 seconds each

Expected results:
It is within 2 seconds each


Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c651f7174f37747de99eaafd27b2d78fb10bead3&tochange=50143dbdcb47bf47c8827c8777b0e11e92e25418

Regressed by:
50143dbdcb47	Sergei Chernov — Bug 1293231 - Certificate Transparency - basic telemetry reports; r=Cykesiopka,keeler
Blocks: 1293231
Has Regression Range: --- → yes
Has STR: --- → yes
status-firefox52: --- → wontfix
status-firefox53: --- → wontfix
status-firefox54: --- → fix-optional
Component: Untriaged → Security: PSM
Flags: needinfo?(sergei.cv)
Keywords: regression
Whiteboard: [STR in comment#6]
(Reporter)

Updated

5 months ago
Attachment #8854330 - Attachment is obsolete: true
(Reporter)

Comment 7

5 months ago
Created attachment 8855235 [details]
about:suppot
Attachment #8854257 - Attachment is obsolete: true
(Reporter)

Comment 8

5 months ago
And I confirmed that setting security.pki.certificate_transparency.mode = 0 fixes the problem
(Reporter)

Updated

5 months ago
Version: Trunk → 52 Branch
(Reporter)

Comment 9

5 months ago
[Tracking Requested - why for this release]: performance regression
tracking-firefox55: --- → ?

Comment 10

5 months ago
I could not reproduce the issue on FF52 with browser.tabs.remote.autostart* = false and security.pki.certificate_transparency.mode = 0. https://tv.yahoo.co.jp/listings/fm/ always takes about 5 seconds to load. I've also tried on FF51 - same results.

Could you please attach two HAR files from FF52 (release) for the page - with certificate_transparency off (loads quickly) and on (loads slowly)? With the difference in the loading times being so significant, I think we should be able to get a clue from there.

Thanks!
Flags: needinfo?(sergei.cv) → needinfo?(alice0775)
(Reporter)

Comment 11

5 months ago
(In reply to Sergei Chernov from comment #10)
> I could not reproduce the issue on FF52 with browser.tabs.remote.autostart*
> = false and security.pki.certificate_transparency.mode = 0.
> https://tv.yahoo.co.jp/listings/fm/ always takes about 5 seconds to load.
> I've also tried on FF51 - same results.
> 
> Could you please attach two HAR files from FF52 (release) for the page -
> with certificate_transparency off (loads quickly) and on (loads slowly)?
> With the difference in the loading times being so significant, I think we
> should be able to get a clue from there.
> 
> Thanks!

What is HAR file? Where is it located? How do I obtain it?
Flags: needinfo?(alice0775) → needinfo?(sergei.cv)
(Reporter)

Comment 12

5 months ago
(In reply to Alice0775 White from comment #11)
> (In reply to Sergei Chernov from comment #10)
> > I could not reproduce the issue on FF52 with browser.tabs.remote.autostart*
> > = false and security.pki.certificate_transparency.mode = 0.
> > https://tv.yahoo.co.jp/listings/fm/ always takes about 5 seconds to load.
> > I've also tried on FF51 - same results.
> > 
> > Could you please attach two HAR files from FF52 (release) for the page -
> > with certificate_transparency off (loads quickly) and on (loads slowly)?
> > With the difference in the loading times being so significant, I think we
> > should be able to get a clue from there.
> > 
> > Thanks!
> 
> What is HAR file? Where is it located? How do I obtain it?

NM.
https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor#CopySave_All_As_HAR

I will do.
Flags: needinfo?(sergei.cv)
(Reporter)

Comment 13

5 months ago
Created attachment 8856291 [details]
Archive 17-04-10 06-43-26.har
(Reporter)

Comment 14

5 months ago
Created attachment 8856292 [details]
Archive 17-04-10 06-47-59.har(with security.pki.certificate_transparency.mode=0)
I believe this is due to inefficient EC signature verification code. I took some traces using the new profiling tool at https://perf-html.io/ (I'll attach them shortly), and it shows that certificate verification is taking ~8ms on my machine without CT and ~192ms with CT (select the socket thread, filter for "verify", and keep opening the arrows until you get to the relevant functions). Drilling into that, about half of the difference is due to actually doing the signature verification (so, math) and the other half is "importing" the public key into the PKCS#11 softoken to operate on it. Since we know in advance what EC keys we're going to be using, we shouldn't waste time reimporting them every time (note that due to an architectural issue with the way keys are represented, though, VFY_VerifyDigestDirect (the NSS function we're using) will always cause the key to be reimported, so we'll have to work around that (we should be able to call PK11_Verify directly).
Created attachment 8856682 [details]
with-certificate-transparency-profile.sps.json.gz
Created attachment 8856683 [details]
without-certificate-transparency-profile.sps.json.gz
>  the difference is due to actually doing the signature verification (so, math)

:( That's bad. P256 shouldn't be that slow. But it seems it is. This sounds like something we probably should fix on the NSS side if it's visibly slowing down things.

Comment 19

4 months ago
Hmmm, does this affect 53? 

We should probably flip the `certificate_transparency.mode` preference back to 0 for release builds until we have this performant.
Created attachment 8857554 [details] [diff] [review]
1353216-disable-ct.diff

It looks like just flipping the pref is sufficient to disable this (no test changes needed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=181a7bb2472593a1ccac40058efdae6da0f25ecd )
Attachment #8857554 - Flags: review?(jjones)
(Assignee)

Updated

4 months ago
Assignee: nobody → dkeeler
status-firefox53: wontfix → ---
status-firefox54: fix-optional → ---
tracking-firefox53: --- → ?
tracking-firefox54: --- → ?
Priority: -- → P1
Summary: In Nightly 55.0a1, page loading is x2 slower than Firefox 51.0.1 → certificate transparency signature verifications negatively impact TLS handshake performance
(Assignee)

Updated

4 months ago
status-firefox53: --- → affected
status-firefox54: --- → affected

Comment 21

4 months ago
Comment on attachment 8857554 [details] [diff] [review]
1353216-disable-ct.diff

Review of attachment 8857554 [details] [diff] [review]:
-----------------------------------------------------------------

OK, this is fine. We'll land this in 55 and -- well, does this need uplift to 54 / 53?

I'll file a bug to turn this back on, but we shouldn't do that until it's not a noticeable performance hit.
Attachment #8857554 - Flags: review?(jjones) → review+

Updated

4 months ago
Blocks: 1355903
[Tracking Requested - why for this release]: sounds like this might need to get fixed in 52esr
status-firefox-esr52: --- → affected
tracking-firefox-esr52: --- → ?

Comment 23

4 months ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/dd72f3e3466f
disable certificate transparency processing for performance concerns r=jcj
Tracking 54/55+. Probably too late for 53 at this point since are already at RC, but we should circle back regarding Comment 22.
tracking-firefox54: ? → +
tracking-firefox55: ? → +
Flags: needinfo?(lhenry)
I discussed this with keeler earlier today, if we land it quickly it can still make the 53 RC2 build. 

I am not super clear whether this should be a 53 release blocker, but I think not, since we've had the problem around for the entire 52 cycle.
I filed bug 1355993 to investigate the underlying slowness of EC in NSS.
tracking-firefox53: ? → +
Flags: needinfo?(lhenry)

Comment 27

4 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/cef7b5c139ef
status-firefox54: affected → fixed

Comment 28

4 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/2326a8ab1506
https://hg.mozilla.org/releases/mozilla-release/rev/2326a8ab1506
status-firefox53: affected → fixed

Comment 29

4 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-esr52/rev/aaa0bba23803
status-firefox-esr52: affected → fixed

Comment 30

4 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/dd72f3e3466f
Status: NEW → RESOLVED
Last Resolved: 4 months ago
status-firefox55: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
See Also: → bug 1356073
Comment on attachment 8857554 [details] [diff] [review]
1353216-disable-ct.diff

post hoc a+ all the things.
Attachment #8857554 - Flags: approval-mozilla-release+
Attachment #8857554 - Flags: approval-mozilla-esr52+
Attachment #8857554 - Flags: approval-mozilla-beta+
Attachment #8857554 - Flags: approval-mozilla-aurora+
tracking-firefox-esr52: ? → 53+

Comment 32

4 months ago
I concur with this disablement, but just in terms of scale, it seems like there must be a really unreasonable number of verifications to add up to 2 seconds. Before we re-enable this, someone needs to properly diagnose it. Keeler, I don't think it has to be you, but I think you are in charge of figuring out who it is.
Flags: needinfo?(dkeeler)
Looking at https://perfht.ml/2od2JGj (this is Nightly with CT and HTTP/2 enabled), necko is spending a lot of time calling nsNSSSocketInfo::IsAcceptableForHost (via JoinConnection), which is what is doing these signature verifications. :mcmanus, is this expected? Looking at some debugging output, necko seems to be repeatedly asking if (e.g.) a.example.com can join example.com, and then b.example.com, etc. Would it be sound to cache the results so we don't have to rebuild a certificate chain and re-verify the CT information each time?

(Also, fwiw, I was under the impression that ECDSA signature verification was supposed to be faster than RSA, which in retrospect wouldn't be the case particularly for small public RSA exponents, so blaming ECDSA specifically was probably a red herring here.)
Flags: needinfo?(dkeeler) → needinfo?(mcmanus)
there probably is an interaction here with the new coalescing code that went in with the origin frame code - that's 55 only. will add more thoughts later. necko might be the place to put a simpler filter.
Flags: needinfo?(mcmanus)
@keeler - can you make a different bug for comment 33 (as that one is a nightly 55 only issue, I'm sure and it seems like people are certain this bug has a strong 52 component). in particular the STR would matter a lot as to the state of the coalescing table. And we should also retest tomorrow's build where bug 1355591 has landed which probably improves things a bit.. you are likely right that a cache is needed.
I filed bug 1356611 and re-ran the profiling. Bug 1355591 may have helped, but the underlying issue seems to still be there.
Duplicate of this bug: 1356099
You need to log in before you can comment on or make changes to this bug.