certificate transparency signature verifications negatively impact TLS handshake performance

RESOLVED FIXED in Firefox 53

Status

()

Core
Security: PSM
P1
normal
RESOLVED FIXED
21 days ago
6 days ago

People

(Reporter: Alice0775 White, Assigned: keeler)

Tracking

(Blocks: 1 bug, {perf, regression})

52 Branch
mozilla55
x86
Windows 10
perf, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 wontfix, firefox53+ fixed, firefox54+ fixed, firefox55+ fixed, firefox-esr5253+ fixed)

Details

(Whiteboard: [STR in comment#6])

Attachments

(6 attachments, 2 obsolete attachments)

(Reporter)

Description

21 days ago
Created attachment 8854257 [details]
about:suppot

Disabling e10s does not solve in Nightly54.0a1.

Reproducible: always

Steps To Reproduce:
1. Start with clean profile, and disable Flash Player if any
2. Load https://www.yahoo.co.jp/ in several times

Actual Results:
In Nightly 55.0a1, page loading is x2 slower than Firefox 51.0.1
Elapsed time until tab-throbber stops:
Firefox 51.0.1(without e10s) :  7-10sec
Nightly 55.0a1(with/without e10s) : 20-25sec

Expected Results:
Not so.

My environment:
Windows10 Home 1607 64bit, Core2Quad 4@2.5GHz, Mem 8GB, Net ADSL 12(actual: down 9.5Mbps, up 800Kbps)
Comment hidden (obsolete)
Comment hidden (obsolete)
Comment hidden (obsolete)
Comment hidden (obsolete)
(Reporter)

Updated

20 days ago
Blocks: 1353289
Comment hidden (obsolete)
(Reporter)

Comment 6

18 days ago
In Nightly 55.0a1, page loading is x2 slower than Firefox 51.0.1

Reproducible: almost always

Steps To reproduce:
1. Disable e10s
2. Open https://tv.yahoo.co.jp/listings/fm/
   (maybe Japan region only)
3. Wait to render page and settle CPU
4. Reload page (F5)
   --- Observe the time until tab-throbber stops spinning
5. Reload page (F5)
   --- Observe the time until tab-throbber stops spinning
6. Reload page (F5)
   --- Observe the time until tab-throbber stops spinning

Actual Results:
It takes about 4-6 seconds each

Expected results:
It is within 2 seconds each


Regression window:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c651f7174f37747de99eaafd27b2d78fb10bead3&tochange=50143dbdcb47bf47c8827c8777b0e11e92e25418

Regressed by:
50143dbdcb47	Sergei Chernov — Bug 1293231 - Certificate Transparency - basic telemetry reports; r=Cykesiopka,keeler
Blocks: 1293231
Has Regression Range: --- → yes
Has STR: --- → yes
status-firefox52: --- → wontfix
status-firefox53: --- → wontfix
status-firefox54: --- → fix-optional
Component: Untriaged → Security: PSM
Flags: needinfo?(sergei.cv)
Keywords: regression
Whiteboard: [STR in comment#6]
(Reporter)

Updated

18 days ago
Attachment #8854330 - Attachment is obsolete: true
(Reporter)

Comment 7

18 days ago
Created attachment 8855235 [details]
about:suppot
Attachment #8854257 - Attachment is obsolete: true
(Reporter)

Comment 8

18 days ago
And I confirmed that setting security.pki.certificate_transparency.mode = 0 fixes the problem
(Reporter)

Updated

18 days ago
Version: Trunk → 52 Branch
(Reporter)

Comment 9

15 days ago
[Tracking Requested - why for this release]: performance regression
tracking-firefox55: --- → ?

Comment 10

15 days ago
I could not reproduce the issue on FF52 with browser.tabs.remote.autostart* = false and security.pki.certificate_transparency.mode = 0. https://tv.yahoo.co.jp/listings/fm/ always takes about 5 seconds to load. I've also tried on FF51 - same results.

Could you please attach two HAR files from FF52 (release) for the page - with certificate_transparency off (loads quickly) and on (loads slowly)? With the difference in the loading times being so significant, I think we should be able to get a clue from there.

Thanks!
Flags: needinfo?(sergei.cv) → needinfo?(alice0775)
(Reporter)

Comment 11

15 days ago
(In reply to Sergei Chernov from comment #10)
> I could not reproduce the issue on FF52 with browser.tabs.remote.autostart*
> = false and security.pki.certificate_transparency.mode = 0.
> https://tv.yahoo.co.jp/listings/fm/ always takes about 5 seconds to load.
> I've also tried on FF51 - same results.
> 
> Could you please attach two HAR files from FF52 (release) for the page -
> with certificate_transparency off (loads quickly) and on (loads slowly)?
> With the difference in the loading times being so significant, I think we
> should be able to get a clue from there.
> 
> Thanks!

What is HAR file? Where is it located? How do I obtain it?
Flags: needinfo?(alice0775) → needinfo?(sergei.cv)
(Reporter)

Comment 12

15 days ago
(In reply to Alice0775 White from comment #11)
> (In reply to Sergei Chernov from comment #10)
> > I could not reproduce the issue on FF52 with browser.tabs.remote.autostart*
> > = false and security.pki.certificate_transparency.mode = 0.
> > https://tv.yahoo.co.jp/listings/fm/ always takes about 5 seconds to load.
> > I've also tried on FF51 - same results.
> > 
> > Could you please attach two HAR files from FF52 (release) for the page -
> > with certificate_transparency off (loads quickly) and on (loads slowly)?
> > With the difference in the loading times being so significant, I think we
> > should be able to get a clue from there.
> > 
> > Thanks!
> 
> What is HAR file? Where is it located? How do I obtain it?

NM.
https://developer.mozilla.org/en-US/docs/Tools/Network_Monitor#CopySave_All_As_HAR

I will do.
Flags: needinfo?(sergei.cv)
(Reporter)

Comment 13

15 days ago
Created attachment 8856291 [details]
Archive 17-04-10 06-43-26.har
(Reporter)

Comment 14

15 days ago
Created attachment 8856292 [details]
Archive 17-04-10 06-47-59.har(with security.pki.certificate_transparency.mode=0)
I believe this is due to inefficient EC signature verification code. I took some traces using the new profiling tool at https://perf-html.io/ (I'll attach them shortly), and it shows that certificate verification is taking ~8ms on my machine without CT and ~192ms with CT (select the socket thread, filter for "verify", and keep opening the arrows until you get to the relevant functions). Drilling into that, about half of the difference is due to actually doing the signature verification (so, math) and the other half is "importing" the public key into the PKCS#11 softoken to operate on it. Since we know in advance what EC keys we're going to be using, we shouldn't waste time reimporting them every time (note that due to an architectural issue with the way keys are represented, though, VFY_VerifyDigestDirect (the NSS function we're using) will always cause the key to be reimported, so we'll have to work around that (we should be able to call PK11_Verify directly).
Created attachment 8856682 [details]
with-certificate-transparency-profile.sps.json.gz
Created attachment 8856683 [details]
without-certificate-transparency-profile.sps.json.gz
>  the difference is due to actually doing the signature verification (so, math)

:( That's bad. P256 shouldn't be that slow. But it seems it is. This sounds like something we probably should fix on the NSS side if it's visibly slowing down things.

Comment 19

13 days ago
Hmmm, does this affect 53? 

We should probably flip the `certificate_transparency.mode` preference back to 0 for release builds until we have this performant.
Created attachment 8857554 [details] [diff] [review]
1353216-disable-ct.diff

It looks like just flipping the pref is sufficient to disable this (no test changes needed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=181a7bb2472593a1ccac40058efdae6da0f25ecd )
Attachment #8857554 - Flags: review?(jjones)
(Assignee)

Updated

12 days ago
Assignee: nobody → dkeeler
status-firefox53: wontfix → ---
status-firefox54: fix-optional → ---
tracking-firefox53: --- → ?
tracking-firefox54: --- → ?
Priority: -- → P1
Summary: In Nightly 55.0a1, page loading is x2 slower than Firefox 51.0.1 → certificate transparency signature verifications negatively impact TLS handshake performance
(Assignee)

Updated

12 days ago
status-firefox53: --- → affected
status-firefox54: --- → affected

Comment 21

12 days ago
Comment on attachment 8857554 [details] [diff] [review]
1353216-disable-ct.diff

Review of attachment 8857554 [details] [diff] [review]:
-----------------------------------------------------------------

OK, this is fine. We'll land this in 55 and -- well, does this need uplift to 54 / 53?

I'll file a bug to turn this back on, but we shouldn't do that until it's not a noticeable performance hit.
Attachment #8857554 - Flags: review?(jjones) → review+

Updated

12 days ago
Blocks: 1355903
[Tracking Requested - why for this release]: sounds like this might need to get fixed in 52esr
status-firefox-esr52: --- → affected
tracking-firefox-esr52: --- → ?

Comment 23

12 days ago
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/dd72f3e3466f
disable certificate transparency processing for performance concerns r=jcj
Tracking 54/55+. Probably too late for 53 at this point since are already at RC, but we should circle back regarding Comment 22.
tracking-firefox54: ? → +
tracking-firefox55: ? → +
Flags: needinfo?(lhenry)
I discussed this with keeler earlier today, if we land it quickly it can still make the 53 RC2 build. 

I am not super clear whether this should be a 53 release blocker, but I think not, since we've had the problem around for the entire 52 cycle.
I filed bug 1355993 to investigate the underlying slowness of EC in NSS.
tracking-firefox53: ? → +
Flags: needinfo?(lhenry)
https://hg.mozilla.org/releases/mozilla-aurora/rev/cef7b5c139ef
status-firefox54: affected → fixed
https://hg.mozilla.org/releases/mozilla-beta/rev/2326a8ab1506
https://hg.mozilla.org/releases/mozilla-release/rev/2326a8ab1506
status-firefox53: affected → fixed
https://hg.mozilla.org/releases/mozilla-esr52/rev/aaa0bba23803
status-firefox-esr52: affected → fixed

Comment 30

12 days ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/dd72f3e3466f
Status: NEW → RESOLVED
Last Resolved: 12 days ago
status-firefox55: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
See Also: → bug 1356073
Comment on attachment 8857554 [details] [diff] [review]
1353216-disable-ct.diff

post hoc a+ all the things.
Attachment #8857554 - Flags: approval-mozilla-release+
Attachment #8857554 - Flags: approval-mozilla-esr52+
Attachment #8857554 - Flags: approval-mozilla-beta+
Attachment #8857554 - Flags: approval-mozilla-aurora+
tracking-firefox-esr52: ? → 53+
I concur with this disablement, but just in terms of scale, it seems like there must be a really unreasonable number of verifications to add up to 2 seconds. Before we re-enable this, someone needs to properly diagnose it. Keeler, I don't think it has to be you, but I think you are in charge of figuring out who it is.
Flags: needinfo?(dkeeler)
Looking at https://perfht.ml/2od2JGj (this is Nightly with CT and HTTP/2 enabled), necko is spending a lot of time calling nsNSSSocketInfo::IsAcceptableForHost (via JoinConnection), which is what is doing these signature verifications. :mcmanus, is this expected? Looking at some debugging output, necko seems to be repeatedly asking if (e.g.) a.example.com can join example.com, and then b.example.com, etc. Would it be sound to cache the results so we don't have to rebuild a certificate chain and re-verify the CT information each time?

(Also, fwiw, I was under the impression that ECDSA signature verification was supposed to be faster than RSA, which in retrospect wouldn't be the case particularly for small public RSA exponents, so blaming ECDSA specifically was probably a red herring here.)
Flags: needinfo?(dkeeler) → needinfo?(mcmanus)
there probably is an interaction here with the new coalescing code that went in with the origin frame code - that's 55 only. will add more thoughts later. necko might be the place to put a simpler filter.
Flags: needinfo?(mcmanus)
@keeler - can you make a different bug for comment 33 (as that one is a nightly 55 only issue, I'm sure and it seems like people are certain this bug has a strong 52 component). in particular the STR would matter a lot as to the state of the coalescing table. And we should also retest tomorrow's build where bug 1355591 has landed which probably improves things a bit.. you are likely right that a cache is needed.
I filed bug 1356611 and re-ran the profiling. Bug 1355591 may have helped, but the underlying issue seems to still be there.
Duplicate of this bug: 1356099
You need to log in before you can comment on or make changes to this bug.