Closed Bug 1529044 Opened 6 years ago Closed 6 years ago

Saving intermediate certs to the DB has a significant detrimental effect on page load time (android)

Categories

(Core :: Security: PSM, defect, P1)

All
Android
defect

Tracking

()

RESOLVED FIXED
mozilla68
Performance Impact high
Tracking Status
firefox-esr60 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: acreskey, Assigned: keeler)

References

Details

(Keywords: perf:pageload, Whiteboard: [psm-assigned][geckoview:p1])

Attachments

(5 files)

I inadvertently found a way to initialize NSS in "no cert DB" mode on android and observed that the load time for early pages is significantly improved. (e.g. 20%-30% + depending on the page).

This codepath in InitializeNSSWithFallbacks() that will ultimately be called in "no cert DB mode":
https://searchfox.org/mozilla-central/source/security/manager/ssl/nsNSSComponent.cpp#1627

I've noticed the performance improvement across numerous sites while looking into the Nimbledroid android tests.
But here is a comparison of running with and without the cert DB.

Site:
https://m.ebay-kleinanzeigen.de/s-anzeigen/auf-zeit-wg-berlin/zimmer/c199-l3331

NSS initialized with no cert DB

		loadtime (ms)
		7,684 
		8,193 
		8,120 
		6,572 
		6,327 
		9,698 
		6,837 
		9,821 
		7,080 
		7,464 
		6,274 
		7,693 
		11,155 
		6,880 
		6,487 
		7,376 
		5,569 
		6,401 
		7,492 
		7,150 
		6,996 
		6,720 
		7,650 
		6,935 
		6,954 
		7,214 
		9,178 
		6,429 
		6,388 
		7,540 
		
Mean		7,409 
Median		7,115 

NSS initialized with cert DB (current behaviour)

		loadtime (ms)
		12,524 
		11,553 
		11,934 
		7,384 
		7,353 
		9,723 
		11,783 
		13,426 
		7,294 
		11,044 
		7,816 
		9,040 
		10,104 
		12,563 
		11,050 
		11,533 
		9,347 
		10,925 
		6,824 
		8,020 
		8,262 
		8,104 
		7,857 
		13,125 
		9,047 
		12,482 
		9,123 
		9,776 
		10,378 
		7,388 
		
Mean		9,893 
Median		9,750

I'm needinfo-ing myself to collect finer grain profiles from simpleperf.

I imagine that it's not feasible to run with NSS initialized w/ out the certDB on android, but perhaps the work can be deferred?

Flags: needinfo?(acreskey)

I triggered the "no cert DB" mode on android by launching with an empty profile in a non default location (/mnt/sdcard/Download)

Flags: needinfo?(acreskey)
Whiteboard: [qf]
Flags: needinfo?(acreskey)

What device is this timing from? Cheers

Whiteboard: [qf] → [qf:p1:pageload]

Mark this was measured on a Moto G5, although I've reproduced similar results on other android devices.

Andrew, I just posted some test results in bug 1506471 comment 39. Looks promising.

Dragana,
Would you have any ideas on this?
Or know who might?

Flags: needinfo?(acreskey) → needinfo?(dd.mozilla)
Flags: needinfo?(dd.mozilla) → needinfo?(dkeeler)

See also bug 1370516. Due to recent other work (e.g. bug 1514118) this might not actually be that difficult - we can potentially load the user db (which is part of what's slow) on a background thread.

Flags: needinfo?(dkeeler)
Assignee: nobody → dkeeler
Component: Libraries → Security: PSM
Priority: -- → P1
Product: NSS → Core
QA Contact: jjones
Whiteboard: [qf:p1:pageload] → [qf:p1:pageload][psm-assigned]
Version: trunk → unspecified
OS: Unspecified → Android
Hardware: Unspecified → All
Whiteboard: [qf:p1:pageload][psm-assigned] → [qf:p1:pageload][psm-assigned][geckoview:p1]

I hope the reworded title is slightly more representative of the issue? As I understand, it's actually fresh profiles that would be unaffected because they don't yet have a cert DB and NSS initialisation therefore is still fast.

Summary: Creating a new profile with NSS Cert DB has a significant detrimental effect on early page loads (android) → Initialising NSS with Cert DB available has a significant detrimental effect on early page loads (android)

I think it's good to have "Initialising NSS with Cert DB" in the title.
So far I know that initializing NSS in cert DB mode does actually impact the new profile case.
(This is where I first encountered it, when profiling Nimbledroid android tests where a new profile is used for every page load).

I'll find out if this also impacts page load on existing profiles and we can update the description.

I still also have to capture a proper call stack which should shed some light on what exactly is going on.

Flags: needinfo?(acreskey)

Andrew, I have a rough draft of what this might look like at https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c62e754baef345cbb73ad5bcad58a8ce0493b8e - maybe you could use that to validate the approach I'm considering? (I spun off an android-api-16 opt build, I think, but in any case you can trigger any additional builds you need there.)

Good stuff Dana.
I'll grab that diff and build it locally to measure the performance.

Also, I haven't been able to find any evidence that initializing NSS with CertDB impacts pageload performance when an existing profile is used.
To test this I modified the android Nimbledriod suite to not clear the app data between runs (i.e. only force-kill the process).

The first couple of pageloads are improved when initializing in no-cert DB mode, but after that there is no significant difference in performance.

Hi Dana, I pulled down that patch and applied it to it's ancestor.
Unfortunately when I launch geckoview_example I see this crash on startup -- any ideas?

02-25 11:13:01.986 29465 29483 E GeckoLibLoad: Load sqlite start
02-25 11:13:02.004 29465 29483 W GeckoLinker: /data/app/org.mozilla.geckoview_example-1/lib/arm/libnss3.so: Relocation to NULL @0x001338f0 for symbol "__cxa_type_match"
02-25 11:13:02.004 29465 29483 E GeckoJNI: missing PK11SDR_Encrypt
02-25 11:13:02.004 29465 29483 E GeckoLibLoad: Throw
02-25 11:13:02.005 29465 29483 E GeckoLibLoad: Load sqlite done
--------- beginning of crash
02-25 11:13:02.005 29465 29483 E AndroidRuntime: FATAL EXCEPTION: Gecko
02-25 11:13:02.005 29465 29483 E AndroidRuntime: Process: org.mozilla.geckoview_example, PID: 29465
02-25 11:13:02.005 29465 29483 E AndroidRuntime: java.lang.Exception: Error loading sqlite libraries
02-25 11:13:02.005 29465 29483 E AndroidRuntime: 	at org.mozilla.gecko.mozglue.GeckoLoader.loadSQLiteLibsNative(Native Method)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: 	at org.mozilla.gecko.mozglue.GeckoLoader.loadSQLiteLibs(GeckoLoader.java:199)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: 	at org.mozilla.gecko.GeckoThread.loadGeckoLibs(GeckoThread.java:266)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: 	at org.mozilla.gecko.GeckoThread.initGeckoEnvironment(GeckoThread.java:289)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: 	at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:417)
02-25 11:13:02.007  1759  2247 W ActivityManager:   Force finishing activity org.mozilla.geckoview_example/.GeckoViewActivity

Will do.

Will do.

If it's helpful, these are captured perf event breakdowns with and without cert-db:

Baseline profile:

Overhead  Shared Object
58.55%    /data/app/org.mozilla.geckoview_example-1/lib/arm/libxul.so
14.63%    /data/app/org.mozilla.geckoview_example-1/lib/arm/libfreebl3.so
8.25%     /system/lib/libart.so
5.23%     /system/lib/libc.so
4.35%     /data/app/org.mozilla.geckoview_example-1/lib/arm/libmozglue.so
1.45%     /data/app/org.mozilla.geckoview_example-1/lib/arm/libnss3.so
1.15%     /system/framework/arm/boot-core-oj.oat
0.99%     //anon

Running w/ no-cert DB

Overhead  Shared Object
33.33%    /data/app/org.mozilla.geckoview_example-1/lib/arm/libxul.so
26.24%    /system/lib/libart.so
11.49%    /system/lib/libc.so
5.39%     /data/app/org.mozilla.geckoview_example-1/lib/arm/libmozglue.so
5.11%     /system/lib/libart-compiler.so
3.83%     /system/framework/arm/boot-framework.oat
2.70%     /system/framework/arm/boot-core-oj.oat
2.27%     //anon

Attached the symbols called in libfreebl3.so

I'm just learning to read these now -- I attached the android SimplePerf HTML report for a slower page load.

I ran the tests with your latest patch Dana but unfortunately I couldn't discern and performance improvement in the early page load:
https://paste.rs/5F9

I know that I shared a lot of profiling data above, but if there are particular code paths that I could help profile, let me know.

Flags: needinfo?(acreskey)

I don't have a sense of what the above are telling me. What configuration led to the data in attachment 9046422 [details]? Is that with or without a certdb? Or is that the code I posted?

In terms of attachment 9046420 [details], all that's telling me is that we're using AES, which makes sense if we're connecting to a server using TLS.

I guess it would be helpful to compare what happens when we call NSS_Initialize in one configuration compared to another.

Flags: needinfo?(acreskey)

(attachment 9046420 [details]) and (attachment 9046422 [details]) are both sampled when running with certDB, i.e. mozilla-central behavior.
When running with cert-db disabled, samples from /data/app/org.mozilla.geckoview_example-1/lib/arm/libfreebl3.so in the parent process disappear almost entirely.

Although I'm really not familiar with this code, it looks like when running with certDB the overhead of TLS increases dramatically.

I also saw this when profiling a test site which reports the tcp connect time via the browser performance.timing api.
(https://acreskeymoz.github.io/)

So let me dig deeper to see if I can confirm this and find out why.

And yes, good idea, I'll compare NSS_Initialize with both configurations.

I haven't been able to find a smoking gun via profiling yet, but I did discover something interesting and unexpected:

Initializing NSS with cert-db (default) doesn't impact loadtime on subsequent loads of the same page (even after force-kill).

But it does impact loads of different pages, regardless of whether the browser has been running for 4+ minutes.

Results of long session experiments are here:
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=0

Even after the browser has been running and loading pages, I can see a ~30% improvement in loadtime on wikimedia if I've initialized NSS in no-cert-db mode. (https://searchfox.org/mozilla-central/source/security/manager/ssl/nsNSSComponent.cpp#1687)

Perhaps it's as simple as this: adding the server's certificate to the database is slow and OMT?

Dana, is it possible for me to disable the addition of new certificates as a test?

I'll continue to try to capture something interesting via profiling.

We don't permanently save server certificates in normal browsing, but we do save intermediate certificates, which could explain the differences. You could verify by removing the call to SaveIntermediateCerts here: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/security/certverifier/CertVerifier.cpp#957

Dana -- thank you -- that very much looks to be the difference :)

Commenting out the call to SaveIntermediateCerts is improving pageload by ~40% on the given site.
I need to run broader tests.

You can see results here:
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=0&range=A35

So, while I get more data...

I wonder why this is having such an impact?
I did profile a pageload on android and SaveIntermediateCerts in the content process was only 2ms.
http://bit.ly/2XsK10v

Flags: needinfo?(acreskey)

SaveIntermediateCerts shouldn't be running in the content process (and the stacks it shows up in there don't make sense).
If you include "SSL" in the list of threads to capture, it should show up in the parent process.

Ah, thanks, I can now at least see the SSL threads of the parent process.

Since this has such a large performance impact, can you help me understand the following:

1 - Why do we save these intermediate certificates? (To save the work of verifying them again if they are encountered later on?)

2 - So I think saving the certificate to DB is blocking the TSL handshake until it's complete, is that right? Could this potentially be done asynchronously?

We save intermediates to help when the user encounters other misconfigured servers that don't send the right intermediate(s) in case having the cached intermediate will help. e.g. Alice visits example.com, which is properly configured to send the Example CA Intermediate. She then visits example.org, which doesn't send the Example CA Intermediate. If the intermediate from the first connection weren't saved, she would see an "unknown issuer" error page. Of course, this isn't a flawless feature, as if she visited them in the other order, example.org wouldn't work. We're addressing this in bug 1520278 et. al. but it's a little ways off from shipping.

Saving intermediates could certainly be done on a separate thread. That said, since we're hoping to not need to do this at all in the near future, it's unclear that doing that work right now would really help us.

Thanks Dana, that helps.

FYI, I ran more tests comparing saving and not saving intermediates and this hypothesis is holding up.
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit?userstoinvite=mark.paxman99@gmail.com&ts=5c76f0fd#gid=0&range=A83
Not as large an impact as when a site it loaded from intent on fresh install, but still big improvements in load time:
e.g.
13.66%, 27.51%, 13.46%, 21.58%

One more question: it looks like we don't save the intermediates in private browsing contexts
(https://searchfox.org/mozilla-central/source/security/manager/ssl/SSLServerCertVerification.cpp#1287)
and my breakpoints also agree.

So until we have intermediate preloading could we consider not saving intermediates on android where the cost looks to be high. (I'll run our full Nimbledroid suite to get a better picture).

Summary: Initialising NSS with Cert DB available has a significant detrimental effect on early page loads (android) → Saving intermediate certs to the DB has a significant detrimental effect on page load time (android)

I was curious so I put in logging to measure the time it takes to save intermediates to the DB:
On MotoG5

CertVerifier::VerifySSLServerCert SaveIntermediates took 1009.791975
CertVerifier::VerifySSLServerCert SaveIntermediates took 1020.807029
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.001979
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.001354
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.002136
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.002031
Document https://www.google.com/ loaded after 1851ms, load event duration 0ms

The results of a local run of the Nimbledroid android progress-bar based load test:
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=915126455
(Comparison in column AD).

Caveat: Because the Nimbledroid suite relies on finding the progress bar element I've found that it's only accurate for pages with long load times (the progress bar cannot always be found in time for some pages).

Apparently importing a certificate into the NSS certificate DB is slow enough to
materially impact the time it takes to connect to a site. This patch addresses
this by importing any intermediate certificates we want to cache from verified
connections on a background thread (so the certificate verification thread can
return faster).

Fenix MVP will use GeckoView 67. Depending on the risk/reward trade-off, we might want to uplift this fix to 67 Beta so Fenix MVP performance makes a good first impression.

Pushed by dkeeler@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d641ac81d9f0 intermediate certificate caching: import on a background thread to not block certificate verification r=mgoodwin

I've pulled down the patch and I am seeing an improvement but unfortunately not as much I would have expected from moving this work out of the verification path.

The most stable results are from using Browsertime on the Acer reference laptop:

Base revision:

INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 642ms (±34.81ms), firstPaint: 4.14s (±105.05ms), DOMContentLoaded: 4.50s (±105.23ms), Load: 21.37s (±1.13s), rumSpeedIndex: 3488 (±85.88) (20 runs)

Saving intermediates via Cryptotask

INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 212ms (±15.89ms), firstPaint: 4.28s (±114.80ms), DOMContentLoaded: 4.69s (±123.45ms), Load: 18.01s (±1.18s), rumSpeedIndex: 3611 (±97.61) (20 runs)

Not saving intermediates:

INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 242ms (±24.30ms), firstPaint: 3.98s (±109.01ms), DOMContentLoaded: 4.39s (±117.15ms), Load: 12.10s (±1.43s), rumSpeedIndex: 3279 (±92.87) (20 runs)

I also compared 10 runs of the Nimbledroid suite (locally on Moto G5) before and after the patch
The data is very noisy but I think there is evidence of a performance gain, just not as impactful as removing the saving.
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=2115798283&range=AD:AD

I'm not very familiar with these runnables, but perhaps the priority of the thread can be lowered? Potentially it could be deferred until idle cycles were available?

(In reply to Narcis Beleuzu [:NarcisB] from comment #32)

Backed out for XPCShel failures on test_missing_intermediate.js

Backout link: https://hg.mozilla.org/integration/autoland/rev/b0b02f3c5203afa2323ca7a5900f3d3d53037ac9
Push link: https://hg.mozilla.org/integration/autoland/rev/d641ac81d9f07ef75cb3e13f9c33e13564d47fb1
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=235934863&repo=autoland&lineNumber=2065

This uncovered an NSS bug - I updated the patch with a workaround (looks like it works: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f2dbb8af2f5a68a429569b05bb0ff3a9ea7435b8 )

(In reply to Andrew Creskey from comment #33)

I've pulled down the patch and I am seeing an improvement but unfortunately not as much I would have expected from moving this work out of the verification path.

The most stable results are from using Browsertime on the Acer reference laptop:

Base revision:

INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 642ms (±34.81ms), firstPaint: 4.14s (±105.05ms), DOMContentLoaded: 4.50s (±105.23ms), Load: 21.37s (±1.13s), rumSpeedIndex: 3488 (±85.88) (20 runs)

Saving intermediates via Cryptotask

INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 212ms (±15.89ms), firstPaint: 4.28s (±114.80ms), DOMContentLoaded: 4.69s (±123.45ms), Load: 18.01s (±1.18s), rumSpeedIndex: 3611 (±97.61) (20 runs)

Not saving intermediates:

INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 242ms (±24.30ms), firstPaint: 3.98s (±109.01ms), DOMContentLoaded: 4.39s (±117.15ms), Load: 12.10s (±1.43s), rumSpeedIndex: 3279 (±92.87) (20 runs)

I also compared 10 runs of the Nimbledroid suite (locally on Moto G5) before and after the patch
The data is very noisy but I think there is evidence of a performance gain, just not as impactful as removing the saving.
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=2115798283&range=AD:AD

I'm not very familiar with these runnables, but perhaps the priority of the thread can be lowered? Potentially it could be deferred until idle cycles were available?

Not as far as I know, without some non-trivial engineering work.

Which of these numbers are most important? The cryptotask implementation actually seems worse in 4 out of 5 categories, if I"m reading this correctly.

Flags: needinfo?(dkeeler)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #34)

Which of these numbers are most important? The cryptotask implementation actually seems worse in 4 out of 5 categories, if I"m reading this correctly.

Of these metrics I would say the that firstPaint and Load are the most important. The Nimbledroid numbers in the spreadsheet are measuring the progress bar which is a approximately equal to load time as well (but very noisy).
So it does look like a pretty reasonable win on reaching the load event, but yes, maybe some regression on other metrics too.

But I don't like how I only have one site measured accurately here so let me capture a broader sample.

I'll also profile this and see if I can get a better understanding.

Previously, if I'm not mistaken, the cert verification and intermediate storage ran completely independently of the main thread.
Perhaps even the small end-of-task main thread callback is delaying mainthread work?

Or perhaps accessing the cert db to save is delaying other threads that make use of it?
So it's less of a performance win because of this.

I'll see what I can find.

This is a profile of a heavy page load:
http://bit.ly/2TzO6wD

You can now see the new ImportInts threads.
The first one runs its background task for a bit over 1800ms - it looks like mostly flushing sqlite file buffers.

Perhaps this is simply IO contention?

Pushed by dkeeler@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ae1ddf7ebb7b intermediate certificate caching: import on a background thread to not block certificate verification r=mgoodwin
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

I've collected more results on the Acer reference laptop.
Below is a set of runs with the baseline build (prior to your patch, Dana), two sets of runs with your patch, and one set with the saving of intermediates commented out.

These are live sites and there is noise but I think we can see significant improvements across all metrics in at least some of the sites when using the background thread CryptoTask. (lower is better for SpeedIndex).

Not saving the intermediates at all does lead to a faster pageload (you can see and example of the number of short-lived ImportInts threads here:

Baseline
wikipedia.org       backEndTime: 786ms (±103.96ms), firstPaint: 2.04s (±180.65ms), DOMContentLoaded: 2.36s (±140.21ms), Load: 2.57s (±112.79ms), rumSpeedIndex: 935 (±101.89) (20 runs)
medium.com          backEndTime: 1.26s (±800.13ms), firstPaint: 3.41s (±752.90ms), DOMContentLoaded: 3.70s (±805.01ms), Load: 5.71s (±780.66ms), rumSpeedIndex: 3321 (±770.55) (19 runs)
buzzfeed.com        backEndTime: 711ms (±81.56ms),  firstPaint: 865ms (±81.05ms),  DOMContentLoaded: 868ms (±82.41ms),  Load: 16.40s (±3.06s),   rumSpeedIndex: 2953 (±253.85) (20 runs)
washingtonpost.com  backEndTime: 1.10s (±98.54ms),  firstPaint: 2.06s (±157.34ms), DOMContentLoaded: 2.52s (±99.65ms),  Load: 15.75s (±2.55s),   rumSpeedIndex: 2178 (±97.90) (20 runs)

CryptoTask
wikipedia.org       backEndTime: 225ms (±43.73ms),  firstPaint: 2.01s (±203.66ms), DOMContentLoaded: 2.31s (±152.89ms), Load: 2.64s (±89.29ms),  rumSpeedIndex: 585 (±60.23) (20 runs)
medium.com          backEndTime: 365ms (±18.34ms),  firstPaint: 2.56s (±164.42ms), DOMContentLoaded: 2.66s (±128.13ms), Load: 4.88s (±217.35ms), rumSpeedIndex: 2453 (±143.53) (19 runs)
buzzfeed.com        backEndTime: 303ms (±53.29ms),  firstPaint: 596ms (±112.36ms), DOMContentLoaded: 439ms (±52.47ms),  Load: 13.68s (±598.30ms),rumSpeedIndex: 2190 (±207.31) (20 runs)
washingtonpost.com  backEndTime: 641ms (±99.23ms),  firstPaint: 2.09s (±251.23ms), DOMContentLoaded: 2.59s (±190.91ms), Load: 13.76s (±733.88ms),rumSpeedIndex: 2021 (±143.04) (20 runs)

CryptoTask
wikipedia.org       backEndTime: 263ms (±64.88ms), firstPaint: 2.30s (±237.86ms), DOMContentLoaded: 2.79s (±157.01ms), Load: 2.93s (±153.64ms), rumSpeedIndex: 637 (±97.83) (20 runs)
medium.com          backEndTime: 406ms (±40.93ms), firstPaint: 2.49s (±188.87ms), DOMContentLoaded: 2.83s (±148.95ms), Load: 5.41s (±197.38ms), rumSpeedIndex: 2446 (±145.77) (20 runs)
buzzfeed.com        backEndTime: 233ms (±5.69ms),  firstPaint: 680ms (±158.71ms), DOMContentLoaded: 385ms (±8.95ms),   Load: 13.77s (±819.03ms),rumSpeedIndex: 2848 (±251.80) (20 runs)
washingtonpost.com  backEndTime: 625ms (±98.23ms), firstPaint: 2.04s (±190.58ms), DOMContentLoaded: 2.43s (±96.09ms),  Load: 13.38s (±623.44ms),rumSpeedIndex: 1926 (±86.66) (20 runs)

Not saving Intermediates:
wikipedia.org       backEndTime: 227ms (±55.06ms), firstPaint: 1.77s (±256.92ms), DOMContentLoaded: 2.09s (±229.80ms), Load: 2.55s (±147.25ms), rumSpeedIndex: 634 (±107.49) (20 runs)
medium.com          backEndTime: 346ms (±13.07ms), firstPaint: 2.55s (±155.76ms), DOMContentLoaded: 2.63s (±131.54ms), Load: 4.37s (±188.01ms), rumSpeedIndex: 2371 (±131.11) (20 runs)
buzzfeed.com        backEndTime: 245ms (±12.14ms), firstPaint: 393ms (±13.90ms),  DOMContentLoaded: 389ms (±13.71ms),  Load: 9.84s (±1.06s),    rumSpeedIndex: 1450 (±98.50) (20 runs)
washingtonpost.com  backEndTime: 642ms (±90.19ms), firstPaint: 2.08s (±216.39ms), DOMContentLoaded: 2.62s (±145.49ms), Load: 6.78s (±481.46ms), rumSpeedIndex: 2532 (±130.32) (19 runs)

I put together a test patch where the import is done with an IdleRunnable so, if I've coded this correctly, it should only import if the event queue is empty of priority tasks.
I'll collect some numbers and see if there's any potential in this extension.

Setting 67=wontfix because I assume we don't need to uplift this fix to 67 Beta.

I thought we wanted this in 67 for Fenix?

Flags: needinfo?(cpeterson)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #42)

I thought we wanted this in 67 for Fenix?

We're now debating a new proposal to ship GV 68 (Beta) in Fenix MVP instead of 67. We wouldn't need to uplift as many big changes to 67 Beta if Fenix is using GV Nightly for another six weeks.

That said, using 68 is not a done deal, so feel free to uplift to 67 Beta now if you'd like to get more beta test time. There's only one Fennec/GV beta build each week (on Mondays), so you will need to request uplifts by Friday morning to give Relman time to review and process uplift requests.

Flags: needinfo?(cpeterson)
Summary: Saving intermediate certs to the DB has a significant detrimental effect on page load time (android) → Saving intermediate certs to the DB has a significant detrimental effect on page load time (android)

(In reply to Andrew Creskey from comment #40)

I put together a test patch where the import is done with an IdleRunnable so, if I've coded this correctly, it should only import if the event queue is empty of priority tasks.
I'll collect some numbers and see if there's any potential in this extension.

Did this turn out to be beneficial?

Flags: needinfo?(acreskey)

The browsertime results were still very noisy so I'm re-running them now with these changes:
• number of page loads increased from 20 to 30
• number of sites increased from 4 to 6
• browser stabilization period of 30 seconds added (following raptor tp6)

As a consequence, it's taking a long time to execute...

Giving this a bit more thought, I'm quite concerned that I'll be regressing TTFI since the big import tasks are ending up on the main thread.
We're still not quite there with automated testing for all of this, so something to be careful about, I think.

Should have the results soon though. From profiling was interesting to see all the ImportInts show up interspersed in the main thread.

That's the patch by the way. I'm new to the threading and scheduling here so it's likely there's a more efficient way of doing this. Specifically, do I really need the overhead of creating threads just to run a task on the main?

Comment on attachment 9054595 [details] [diff] [review] Proof-of-concept import certs on idle Review of attachment 9054595 [details] [diff] [review]: ----------------------------------------------------------------- Neat! I didn't even realize we had priority queues... ::: security/manager/ssl/CryptoTask.cpp @@ +74,5 @@ > + > + // Note: event must not null out mThread! > + // return mThread->Dispatch(this, NS_DISPATCH_NORMAL); > + nsCOMPtr<nsIRunnable> runnable = this; > + return NS_DispatchToMainThreadQueue(runnable.forget(), EventQueuePriority::Idle); `NS_DispatchToCurrentThreadQueue` might be even better - that way, we'll avoid main-thread I/O ::: security/manager/ssl/CryptoTask.h @@ +94,5 @@ > + TimeStamp mDeadline{}; > + bool mRan = false; > + > + nsresult mRv; > + nsCOMPtr<nsIThread> mThread; Yeah it actually doesn't look like you need this at all...

Thanks Dana, let me look into those optimizations.
jesup: was looking at the profiles and suggested that I try the LazyIdleThread where I can push all the imports to a single thread and avoid both the tread creation overhead and the threads colliding over access to the sql database.
Perhaps dispatching to the current thread queue as you suggested above may accomplish that as well. I'll investigate.

These are the best performance numbers I have yet:

https://paste.rs/SJJ

I think that they show:
-Some big gains with the landed CryptoTask solution (e.g. 18.45% improvement in mean load time of washingtonpost url)
-Some cases where performance with CryptoTasks is decreased somewhat, but within the std dev
-The Import-on-Idle that I implemented improves some scenarios significantly but also regresses others
-Not importing is the fastest solution (obviously expected)

Thanks Dana -- I think that conceptually that's very similar to what I tested last night:
https://hg.mozilla.org/try/rev/c762fd4e57ce98204580a9233c4fb1c09e62b12d

The import work is pushed onto the current thread (one of the SSL Cert threads) with a small main-thread interaction for the observer service.
I would say that your NS_NewRunnableFunction accomplishes this with less code though.

These results compare the new baseline (CryptoTask threads) vs that revision of mine:
https://paste.rs/itd

Looking promising. I want to get results from more sites, so I'll expand this with your patch.

My one concern with this general approach is that we could be slowing down SSL Cert verification due to the large number of import tasks that can build up.

I'm getting closer to having a solution where we just have one dedicated thread that can do the import tasks, that might be best.

Flags: needinfo?(acreskey)

This is what I have for executing all of the import tasks on a single thread (SharedThreadPool with n=1)"
https://hg.mozilla.org/try/rev/5fc8cfeecbd4d4a6846fcdaa36a43125214cb68e
You're importCertsRunnable came in handy Dana :)
Perhaps this can be accomplished without the threadpool? i.e. just a single dedicated thread with a timeout.
I looked into LazyIdleThread but it can only be used from the thread that created it, and there are up to 5 SSL Cert threads.
I'll look into this while I get performance numbers coming back from these changes.

Blocks: 1520505

(In reply to Chris Peterson [:cpeterson] from comment #43)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #42)

I thought we wanted this in 67 for Fenix?

We're now debating a new proposal to ship GV 68 (Beta) in Fenix MVP instead of 67. We wouldn't need to uplift as many big changes to 67 Beta if Fenix is using GV Nightly for another six weeks.

That said, using 68 is not a done deal, so feel free to uplift to 67 Beta now if you'd like to get more beta test time. There's only one Fennec/GV beta build each week (on Mondays), so you will need to request uplifts by Friday morning to give Relman time to review and process uplift requests.

FYI: Fenix MVP will use GeckoView 68, not 67, so we don't need to uplift this fix to 67 Beta unless you want it in Fennec 67.

These are the results from yesterday's run: comparing our new baseline (landed Cryptotask solution, Dana's NS_NewRunnableFunction from Comment 50, and my Import ThreadPool from Comment 52

https://paste.rs/vow

Some of these are high-noise sites, but although I see improvements some of the regressions are concerning.

I profiled a run with the patch from Comment 52 and it looks like the import thread is spending almost all of it's time waiting for the lock in nssSession_EnterMonitor so that's not ideal.
https://perfht.ml/2YMKn2p

I'm having a bit of a hard time finding a reliable way to measure the current alternatives.
It looks like I'll need to get WebPageReplay going to get deterministic pages.

But I found this to be interesting: a profile of importing the certs on the SSL Cert threads (Comment 50) on idle -- it's definitely doing what it's supposed to: you can see them at the 60 second mark:
https://perfht.ml/2YPaTII

I was able to use :nalexander's nascent Browsertime-for-firefox-on-android to kick off long running tests of these various alternatives on android.

Browsertime results: 30 cold page loads of each url with a conditioned profile
Moto G5, geckoview example, --enable-release

                      |Prior to Bug 1529044  | Dana's CryptoTask import   | Idle runnable added to queue| Not saving                  |
                      |(import during cert   | (new runnable per import)  | of given SSL Cert Thread    | intermediate certs          |
                      | verification)        |                            | (Comment 50)                |                             |
pageLoadTime:             mean     stddev        mean    stddev    gain       mean   stddev      gain       mean      stddev    gain    
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com       |   2968.00 (+- 938)   |  2269.00 (+- 500)  23.55%  |  1898.00 (+- 198)  36.05%   |  1851.00 (+- 207)  37.63%   |
www.instagram.com     |   3968.00 (+- 208)   |  3873.00 (+- 159)   2.39%  |  3960.00 (+- 291)   0.20%   |  3879.00 (+- 330)   2.24%   |
www.booking.com       |   4052.00 (+- 393)   |  3958.00 (+- 246)   2.32%  |  3970.00 (+- 300)   2.02%   |  3950.00 (+- 254)   2.52%   |
www.expedia.com       |  10683.00 (+-1655)   | 10557.00 (+-1988)   1.18%  |  8673.00 (+- 215)  18.81%   |  8797.00 (+- 818)  17.65%   |
www.glassdoor.com     |  16546.00 (+-1441)   | 16676.00 (+-1541)  -0.79%  | 16548.00 (+-1619)  -0.01%   | 16365.00 (+-1685)   1.09%   |
en.m.wikipedia.org    |   1220.00 (+- 124)   |  1197.00 (+- 110)   1.89%  |  1243.00 (+- 131)  -1.89%   |  1202.00 (+- 140)   1.48%   |
www.amazon.com        |   3042.00 (+- 450)   |  3053.00 (+- 569)  -0.36%  |  2715.00 (+- 348)  10.75%   |  2546.00 (+- 289)  16.31%   |
medium.com            |   4506.00 (+- 872)   |  4526.00 (+- 913)  -0.44%  |  3935.00 (+- 668)  12.67%   |  3853.00 (+- 793)  14.49%   |
---------------------------------------------------------------------------------------------------------------------------------------
responseStart:
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com       |    304.00 (+-  98)   |   245.00 (+-  17)  19.41%  |   247.00 (+-  12)  18.75%   |   250.00 (+-  14)  17.76%   |
www.instagram.com     |    607.00 (+-  90)   |   519.00 (+-  51)  14.50%  |   538.00 (+- 190)  11.37%   |   566.00 (+- 204)   6.75%   |
www.booking.com       |    643.00 (+- 258)   |   549.00 (+-  58)  14.62%  |   542.00 (+-  65)  15.71%   |   512.00 (+-  64)  20.37%   |
www.expedia.com       |    529.00 (+-  69)   |   462.00 (+-  62)  12.67%  |   472.00 (+-  56)  10.78%   |   462.00 (+-  56)  12.67%   |
www.glassdoor.com     |   1265.00 (+- 150)   |  1179.00 (+- 221)   6.80%  |  1146.00 (+- 147)   9.41%   |  1166.00 (+- 208)   7.83%   |
en.m.wikipedia.org    |    344.00 (+-  27)   |   296.00 (+-  17)  13.95%  |   304.00 (+-  44)  11.63%   |   297.00 (+-  14)  13.66%   |
www.amazon.com        |    776.00 (+- 103)   |   702.00 (+-  99)   9.54%  |   677.00 (+- 214)  12.76%   |   646.00 (+-  76)  16.75%   |
medium.com            |    529.00 (+- 220)   |   444.00 (+-  26)  16.07%  |   497.00 (+- 250)   6.05%   |   494.00 (+- 218)   6.62%   |
---------------------------------------------------------------------------------------------------------------------------------------
firstPaint:
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com       |   1234.00 (+- 237)   |  1136.00 (+- 239)   7.94%  |  1164.00 (+- 260)   5.67%   |  1077.00 (+- 252)  12.72%   |
www.instagram.com     |    761.00 (+-  98)   |   683.00 (+-  44)  10.25%  |   707.00 (+- 189)   7.10%   |   727.00 (+- 210)   4.47%   |
www.booking.com       |   1964.00 (+- 293)   |  1852.00 (+- 142)   5.70%  |  1867.00 (+- 148)   4.94%   |  1840.00 (+- 110)   6.31%   |
www.expedia.com       |    775.00 (+-  72)   |   712.00 (+-  66)   8.13%  |   720.00 (+-  55)   7.10%   |   720.00 (+-  87)   7.10%   |
www.glassdoor.com     |   2780.00 (+- 548)   |  2756.00 (+- 594)   0.86%  |  2675.00 (+- 528)   3.78%   |  2850.00 (+- 648)  -2.52%   |
en.m.wikipedia.org    |    673.00 (+-  75)   |   621.00 (+-  67)   7.73%  |   648.00 (+-  83)   3.71%   |   636.00 (+-  78)   5.50%   |
www.amazon.com        |   1790.00 (+- 360)   |  1681.00 (+- 291)   6.09%  |  1662.00 (+- 283)   7.15%   |  1682.00 (+- 242)   6.03%   |
medium.com            |   1524.00 (+- 463)   |  1543.00 (+- 460)  -1.25%  |  1491.00 (+- 299)   2.17%   |  1447.00 (+- 247)   5.05%   |
---------------------------------------------------------------------------------------------------------------------------------------
firstContentfulPaint:
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com       |   1826.00 (+- 202)   |  1777.00 (+- 189)   2.68%  |  1771.00 (+- 196)   3.01%   |  1708.00 (+- 134)   6.46%   |
www.instagram.com     |    879.00 (+-  96)   |   798.00 (+-  43)   9.22%  |   826.00 (+- 191)   6.03%   |   839.00 (+- 210)   4.55%   |
www.booking.com       |   2216.00 (+- 319)   |  2111.00 (+- 184)   4.74%  |  2120.00 (+- 161)   4.33%   |  2080.00 (+- 141)   6.14%   |
www.expedia.com       |    914.00 (+-  74)   |   853.00 (+-  65)   6.67%  |   863.00 (+-  53)   5.58%   |   865.00 (+-  99)   5.36%   |
www.glassdoor.com     |   2995.00 (+- 574)   |  2987.00 (+- 645)   0.27%  |  2914.00 (+- 559)   2.70%   |  3058.00 (+- 663)  -2.10%   |
en.m.wikipedia.org    |    829.00 (+-  91)   |   764.00 (+-  83)   7.84%  |   801.00 (+-  99)   3.38%   |   783.00 (+-  94)   5.55%   |
www.amazon.com        |   2030.00 (+- 383)   |  1916.00 (+- 311)   5.62%  |  1902.00 (+- 287)   6.31%   |  1920.00 (+- 238)   5.42%   |
medium.com            |   1790.00 (+- 470)   |  1809.00 (+- 467)  -1.06%  |  1751.00 (+- 305)   2.18%   |  1724.00 (+- 265)   3.69%   |
---------------------------------------------------------------------------------------------------------------------------------------

Full urls:
"https://edition.cnn.com/ampstories/us/still-missing-cassini-heres-what-else-is-out-there"
"https://www.instagram.com/justintimberlake/"
"https://www.booking.com/searchresults.en-gb.html?city=20088325"
"https://www.expedia.com/Hotel-Search?destination=New+York%2C+New+York&latLong=40.756680%2C-73.986470&regionId=178293&startDate=&endDate=&rooms=1&_xpid=11905%7C1&adults=2"
"https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=true&clickSource=searchBtn&typedKeyword=Mozilla&sc.keyword=Mozilla&locT=C&locId=1147431&jobType="
"https://en.m.wikipedia.org/wiki/Mozilla_Corporation"
"https://www.amazon.com/s?k=laptop"
"https://medium.com/s/coincidences-are-a-lie/could-america-have-also-been-the-birthplace-of-impressionism-cb3d31a2e22d"

The results from dispatching the NS_NewRunnableFunction as an Idle task on the given SSL Cert thread (Comment 50) look great and match the pageload impact of not importing the certs.
I was able to verify the behavior with logging and sure enough the imports don't start until significantly after the page is fully loaded.
To me this looks like a winner.
Dana?

Flags: needinfo?(dkeeler)

Sounds good - I'll get that patch reviewed and landed.

Status: RESOLVED → REOPENED
Flags: needinfo?(dkeeler)
Resolution: FIXED → ---

Previously this functionality created a CryptoTask to do this work, but that
would cause a new thread to be created for each list of intermediates. This was
slow both because of all of the threads and because they could be scheduled
while other work was happening. Moving these tasks to the low-priority event
queue for threads in the certificate verification thread pool means no new
threads are created and the work only happens when these threads are idle
anyway.

Status: REOPENED → ASSIGNED
Target Milestone: mozilla68 → ---
Pushed by dkeeler@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c3c81f8321b7 use a low-priority queue on a certificate verification thread to import intermediate certificates r=mgoodwin
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload][psm-assigned][geckoview:p1] → [psm-assigned][geckoview:p1]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: