Saving intermediate certs to the DB has a significant detrimental effect on page load time (android)
Categories
(Core :: Security: PSM, defect, P1)
Tracking
()
People
(Reporter: acreskey, Assigned: keeler)
References
Details
(Keywords: perf:pageload, Whiteboard: [psm-assigned][geckoview:p1])
Attachments
(5 files)
I inadvertently found a way to initialize NSS in "no cert DB" mode on android and observed that the load time for early pages is significantly improved. (e.g. 20%-30% + depending on the page).
This codepath in InitializeNSSWithFallbacks() that will ultimately be called in "no cert DB mode":
https://searchfox.org/mozilla-central/source/security/manager/ssl/nsNSSComponent.cpp#1627
I've noticed the performance improvement across numerous sites while looking into the Nimbledroid android tests.
But here is a comparison of running with and without the cert DB.
Site:
https://m.ebay-kleinanzeigen.de/s-anzeigen/auf-zeit-wg-berlin/zimmer/c199-l3331
NSS initialized with no cert DB
loadtime (ms)
7,684
8,193
8,120
6,572
6,327
9,698
6,837
9,821
7,080
7,464
6,274
7,693
11,155
6,880
6,487
7,376
5,569
6,401
7,492
7,150
6,996
6,720
7,650
6,935
6,954
7,214
9,178
6,429
6,388
7,540
Mean 7,409
Median 7,115
NSS initialized with cert DB (current behaviour)
loadtime (ms)
12,524
11,553
11,934
7,384
7,353
9,723
11,783
13,426
7,294
11,044
7,816
9,040
10,104
12,563
11,050
11,533
9,347
10,925
6,824
8,020
8,262
8,104
7,857
13,125
9,047
12,482
9,123
9,776
10,378
7,388
Mean 9,893
Median 9,750
I'm needinfo-ing myself to collect finer grain profiles from simpleperf.
I imagine that it's not feasible to run with NSS initialized w/ out the certDB on android, but perhaps the work can be deferred?
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 1•6 years ago
|
||
I triggered the "no cert DB" mode on android by launching with an empty profile in a non default location (/mnt/sdcard/Download
)
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 3•6 years ago
|
||
Mark this was measured on a Moto G5, although I've reproduced similar results on other android devices.
Andrew, I just posted some test results in bug 1506471 comment 39. Looks promising.
Reporter | ||
Comment 5•6 years ago
|
||
Dragana,
Would you have any ideas on this?
Or know who might?
Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 6•6 years ago
|
||
See also bug 1370516. Due to recent other work (e.g. bug 1514118) this might not actually be that difficult - we can potentially load the user db (which is part of what's slow) on a background thread.
Assignee | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 7•6 years ago
|
||
I hope the reworded title is slightly more representative of the issue? As I understand, it's actually fresh profiles that would be unaffected because they don't yet have a cert DB and NSS initialisation therefore is still fast.
Reporter | ||
Comment 8•6 years ago
|
||
I think it's good to have "Initialising NSS with Cert DB" in the title.
So far I know that initializing NSS in cert DB mode does actually impact the new profile case.
(This is where I first encountered it, when profiling Nimbledroid android tests where a new profile is used for every page load).
I'll find out if this also impacts page load on existing profiles and we can update the description.
I still also have to capture a proper call stack which should shed some light on what exactly is going on.
Assignee | ||
Comment 9•6 years ago
|
||
Andrew, I have a rough draft of what this might look like at https://treeherder.mozilla.org/#/jobs?repo=try&revision=5c62e754baef345cbb73ad5bcad58a8ce0493b8e - maybe you could use that to validate the approach I'm considering? (I spun off an android-api-16 opt build, I think, but in any case you can trigger any additional builds you need there.)
Reporter | ||
Comment 10•6 years ago
|
||
Good stuff Dana.
I'll grab that diff and build it locally to measure the performance.
Also, I haven't been able to find any evidence that initializing NSS with CertDB impacts pageload performance when an existing profile is used.
To test this I modified the android Nimbledriod suite to not clear the app data between runs (i.e. only force-kill the process).
The first couple of pageloads are improved when initializing in no-cert DB mode, but after that there is no significant difference in performance.
Reporter | ||
Comment 11•6 years ago
|
||
Hi Dana, I pulled down that patch and applied it to it's ancestor.
Unfortunately when I launch geckoview_example I see this crash on startup -- any ideas?
02-25 11:13:01.986 29465 29483 E GeckoLibLoad: Load sqlite start
02-25 11:13:02.004 29465 29483 W GeckoLinker: /data/app/org.mozilla.geckoview_example-1/lib/arm/libnss3.so: Relocation to NULL @0x001338f0 for symbol "__cxa_type_match"
02-25 11:13:02.004 29465 29483 E GeckoJNI: missing PK11SDR_Encrypt
02-25 11:13:02.004 29465 29483 E GeckoLibLoad: Throw
02-25 11:13:02.005 29465 29483 E GeckoLibLoad: Load sqlite done
--------- beginning of crash
02-25 11:13:02.005 29465 29483 E AndroidRuntime: FATAL EXCEPTION: Gecko
02-25 11:13:02.005 29465 29483 E AndroidRuntime: Process: org.mozilla.geckoview_example, PID: 29465
02-25 11:13:02.005 29465 29483 E AndroidRuntime: java.lang.Exception: Error loading sqlite libraries
02-25 11:13:02.005 29465 29483 E AndroidRuntime: at org.mozilla.gecko.mozglue.GeckoLoader.loadSQLiteLibsNative(Native Method)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: at org.mozilla.gecko.mozglue.GeckoLoader.loadSQLiteLibs(GeckoLoader.java:199)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: at org.mozilla.gecko.GeckoThread.loadGeckoLibs(GeckoThread.java:266)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: at org.mozilla.gecko.GeckoThread.initGeckoEnvironment(GeckoThread.java:289)
02-25 11:13:02.005 29465 29483 E AndroidRuntime: at org.mozilla.gecko.GeckoThread.run(GeckoThread.java:417)
02-25 11:13:02.007 1759 2247 W ActivityManager: Force finishing activity org.mozilla.geckoview_example/.GeckoViewActivity
Assignee | ||
Comment 12•6 years ago
|
||
Ah - I didn't realize anything else used that function. Try https://treeherder.mozilla.org/#/jobs?repo=try&revision=903bcaa7100f9101da843ce73be01d9b70f0f5e1 ?
Reporter | ||
Comment 13•6 years ago
|
||
Will do.
Will do.
If it's helpful, these are captured perf event breakdowns with and without cert-db:
Baseline profile:
Overhead Shared Object
58.55% /data/app/org.mozilla.geckoview_example-1/lib/arm/libxul.so
14.63% /data/app/org.mozilla.geckoview_example-1/lib/arm/libfreebl3.so
8.25% /system/lib/libart.so
5.23% /system/lib/libc.so
4.35% /data/app/org.mozilla.geckoview_example-1/lib/arm/libmozglue.so
1.45% /data/app/org.mozilla.geckoview_example-1/lib/arm/libnss3.so
1.15% /system/framework/arm/boot-core-oj.oat
0.99% //anon
Running w/ no-cert DB
Overhead Shared Object
33.33% /data/app/org.mozilla.geckoview_example-1/lib/arm/libxul.so
26.24% /system/lib/libart.so
11.49% /system/lib/libc.so
5.39% /data/app/org.mozilla.geckoview_example-1/lib/arm/libmozglue.so
5.11% /system/lib/libart-compiler.so
3.83% /system/framework/arm/boot-framework.oat
2.70% /system/framework/arm/boot-core-oj.oat
2.27% //anon
Reporter | ||
Comment 14•6 years ago
|
||
Attached the symbols called in libfreebl3.so
Reporter | ||
Comment 15•6 years ago
|
||
I'm just learning to read these now -- I attached the android SimplePerf HTML report for a slower page load.
Reporter | ||
Comment 16•6 years ago
|
||
I ran the tests with your latest patch Dana but unfortunately I couldn't discern and performance improvement in the early page load:
https://paste.rs/5F9
I know that I shared a lot of profiling data above, but if there are particular code paths that I could help profile, let me know.
Assignee | ||
Comment 17•6 years ago
|
||
I don't have a sense of what the above are telling me. What configuration led to the data in attachment 9046422 [details]? Is that with or without a certdb? Or is that the code I posted?
In terms of attachment 9046420 [details], all that's telling me is that we're using AES, which makes sense if we're connecting to a server using TLS.
I guess it would be helpful to compare what happens when we call NSS_Initialize in one configuration compared to another.
Reporter | ||
Comment 18•6 years ago
|
||
(attachment 9046420 [details]) and (attachment 9046422 [details]) are both sampled when running with certDB, i.e. mozilla-central behavior.
When running with cert-db disabled, samples from /data/app/org.mozilla.geckoview_example-1/lib/arm/libfreebl3.so in the parent process disappear almost entirely.
Although I'm really not familiar with this code, it looks like when running with certDB the overhead of TLS increases dramatically.
I also saw this when profiling a test site which reports the tcp connect time via the browser performance.timing
api.
(https://acreskeymoz.github.io/)
So let me dig deeper to see if I can confirm this and find out why.
And yes, good idea, I'll compare NSS_Initialize with both configurations.
Reporter | ||
Comment 19•6 years ago
|
||
I haven't been able to find a smoking gun via profiling yet, but I did discover something interesting and unexpected:
Initializing NSS with cert-db (default) doesn't impact loadtime on subsequent loads of the same page (even after force-kill).
But it does impact loads of different pages, regardless of whether the browser has been running for 4+ minutes.
Results of long session experiments are here:
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=0
Even after the browser has been running and loading pages, I can see a ~30% improvement in loadtime on wikimedia if I've initialized NSS in no-cert-db mode. (https://searchfox.org/mozilla-central/source/security/manager/ssl/nsNSSComponent.cpp#1687)
Perhaps it's as simple as this: adding the server's certificate to the database is slow and OMT?
Dana, is it possible for me to disable the addition of new certificates as a test?
I'll continue to try to capture something interesting via profiling.
Assignee | ||
Comment 20•6 years ago
|
||
We don't permanently save server certificates in normal browsing, but we do save intermediate certificates, which could explain the differences. You could verify by removing the call to SaveIntermediateCerts
here: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/security/certverifier/CertVerifier.cpp#957
Reporter | ||
Comment 21•6 years ago
|
||
Dana -- thank you -- that very much looks to be the difference :)
Commenting out the call to SaveIntermediateCerts
is improving pageload by ~40% on the given site.
I need to run broader tests.
You can see results here:
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=0&range=A35
So, while I get more data...
I wonder why this is having such an impact?
I did profile a pageload on android and SaveIntermediateCerts
in the content process was only 2ms.
http://bit.ly/2XsK10v
Assignee | ||
Comment 22•6 years ago
|
||
SaveIntermediateCerts
shouldn't be running in the content process (and the stacks it shows up in there don't make sense).
If you include "SSL" in the list of threads to capture, it should show up in the parent process.
Reporter | ||
Comment 23•6 years ago
|
||
Ah, thanks, I can now at least see the SSL threads of the parent process.
Since this has such a large performance impact, can you help me understand the following:
1 - Why do we save these intermediate certificates? (To save the work of verifying them again if they are encountered later on?)
2 - So I think saving the certificate to DB is blocking the TSL handshake until it's complete, is that right? Could this potentially be done asynchronously?
Assignee | ||
Comment 24•6 years ago
|
||
We save intermediates to help when the user encounters other misconfigured servers that don't send the right intermediate(s) in case having the cached intermediate will help. e.g. Alice visits example.com, which is properly configured to send the Example CA Intermediate. She then visits example.org, which doesn't send the Example CA Intermediate. If the intermediate from the first connection weren't saved, she would see an "unknown issuer" error page. Of course, this isn't a flawless feature, as if she visited them in the other order, example.org wouldn't work. We're addressing this in bug 1520278 et. al. but it's a little ways off from shipping.
Saving intermediates could certainly be done on a separate thread. That said, since we're hoping to not need to do this at all in the near future, it's unclear that doing that work right now would really help us.
Reporter | ||
Comment 25•6 years ago
|
||
Thanks Dana, that helps.
FYI, I ran more tests comparing saving and not saving intermediates and this hypothesis is holding up.
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit?userstoinvite=mark.paxman99@gmail.com&ts=5c76f0fd#gid=0&range=A83
Not as large an impact as when a site it loaded from intent on fresh install, but still big improvements in load time:
e.g.
13.66%, 27.51%, 13.46%, 21.58%
Reporter | ||
Comment 26•6 years ago
|
||
One more question: it looks like we don't save the intermediates in private browsing contexts
(https://searchfox.org/mozilla-central/source/security/manager/ssl/SSLServerCertVerification.cpp#1287)
and my breakpoints also agree.
So until we have intermediate preloading could we consider not saving intermediates on android where the cost looks to be high. (I'll run our full Nimbledroid suite to get a better picture).
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 27•6 years ago
|
||
I was curious so I put in logging to measure the time it takes to save intermediates to the DB:
On MotoG5
CertVerifier::VerifySSLServerCert SaveIntermediates took 1009.791975
CertVerifier::VerifySSLServerCert SaveIntermediates took 1020.807029
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.001979
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.001354
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.002136
CertVerifier::VerifySSLServerCert SaveIntermediates took 0.002031
Document https://www.google.com/ loaded after 1851ms, load event duration 0ms
Reporter | ||
Comment 28•6 years ago
|
||
The results of a local run of the Nimbledroid android progress-bar based load test:
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=915126455
(Comparison in column AD).
Caveat: Because the Nimbledroid suite relies on finding the progress bar element I've found that it's only accurate for pages with long load times (the progress bar cannot always be found in time for some pages).
Assignee | ||
Comment 29•6 years ago
|
||
Apparently importing a certificate into the NSS certificate DB is slow enough to
materially impact the time it takes to connect to a site. This patch addresses
this by importing any intermediate certificates we want to cache from verified
connections on a background thread (so the certificate verification thread can
return faster).
Comment 30•6 years ago
|
||
Fenix MVP will use GeckoView 67. Depending on the risk/reward trade-off, we might want to uplift this fix to 67 Beta so Fenix MVP performance makes a good first impression.
Comment 31•6 years ago
|
||
Comment 32•6 years ago
|
||
Backed out for XPCShel failures on test_missing_intermediate.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/b0b02f3c5203afa2323ca7a5900f3d3d53037ac9
Push link: https://hg.mozilla.org/integration/autoland/rev/d641ac81d9f07ef75cb3e13f9c33e13564d47fb1
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=235934863&repo=autoland&lineNumber=2065
Reporter | ||
Comment 33•6 years ago
|
||
I've pulled down the patch and I am seeing an improvement but unfortunately not as much I would have expected from moving this work out of the verification path.
The most stable results are from using Browsertime on the Acer reference laptop:
Base revision:
INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 642ms (±34.81ms), firstPaint: 4.14s (±105.05ms), DOMContentLoaded: 4.50s (±105.23ms), Load: 21.37s (±1.13s), rumSpeedIndex: 3488 (±85.88) (20 runs)
Saving intermediates via Cryptotask
INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 212ms (±15.89ms), firstPaint: 4.28s (±114.80ms), DOMContentLoaded: 4.69s (±123.45ms), Load: 18.01s (±1.18s), rumSpeedIndex: 3611 (±97.61) (20 runs)
Not saving intermediates:
INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 242ms (±24.30ms), firstPaint: 3.98s (±109.01ms), DOMContentLoaded: 4.39s (±117.15ms), Load: 12.10s (±1.43s), rumSpeedIndex: 3279 (±92.87) (20 runs)
I also compared 10 runs of the Nimbledroid suite (locally on Moto G5) before and after the patch
The data is very noisy but I think there is evidence of a performance gain, just not as impactful as removing the saving.
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=2115798283&range=AD:AD
I'm not very familiar with these runnables, but perhaps the priority of the thread can be lowered? Potentially it could be deferred until idle cycles were available?
Assignee | ||
Comment 34•6 years ago
|
||
(In reply to Narcis Beleuzu [:NarcisB] from comment #32)
Backed out for XPCShel failures on test_missing_intermediate.js
Backout link: https://hg.mozilla.org/integration/autoland/rev/b0b02f3c5203afa2323ca7a5900f3d3d53037ac9
Push link: https://hg.mozilla.org/integration/autoland/rev/d641ac81d9f07ef75cb3e13f9c33e13564d47fb1
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=235934863&repo=autoland&lineNumber=2065
This uncovered an NSS bug - I updated the patch with a workaround (looks like it works: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f2dbb8af2f5a68a429569b05bb0ff3a9ea7435b8 )
(In reply to Andrew Creskey from comment #33)
I've pulled down the patch and I am seeing an improvement but unfortunately not as much I would have expected from moving this work out of the verification path.
The most stable results are from using Browsertime on the Acer reference laptop:
Base revision:
INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 642ms (±34.81ms), firstPaint: 4.14s (±105.05ms), DOMContentLoaded: 4.50s (±105.23ms), Load: 21.37s (±1.13s), rumSpeedIndex: 3488 (±85.88) (20 runs)
Saving intermediates via Cryptotask
INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 212ms (±15.89ms), firstPaint: 4.28s (±114.80ms), DOMContentLoaded: 4.69s (±123.45ms), Load: 18.01s (±1.18s), rumSpeedIndex: 3611 (±97.61) (20 runs)
Not saving intermediates:
INFO: https://marvel.fandom.com/wiki/Black_Panther 0, backEndTime: 242ms (±24.30ms), firstPaint: 3.98s (±109.01ms), DOMContentLoaded: 4.39s (±117.15ms), Load: 12.10s (±1.43s), rumSpeedIndex: 3279 (±92.87) (20 runs)
I also compared 10 runs of the Nimbledroid suite (locally on Moto G5) before and after the patch
The data is very noisy but I think there is evidence of a performance gain, just not as impactful as removing the saving.
https://docs.google.com/spreadsheets/d/1zWUPYJkt2n2D_YkduR6pHa6GvOUT3neYu9gHziutvfo/edit#gid=2115798283&range=AD:ADI'm not very familiar with these runnables, but perhaps the priority of the thread can be lowered? Potentially it could be deferred until idle cycles were available?
Not as far as I know, without some non-trivial engineering work.
Which of these numbers are most important? The cryptotask implementation actually seems worse in 4 out of 5 categories, if I"m reading this correctly.
Reporter | ||
Comment 35•6 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #34)
Which of these numbers are most important? The cryptotask implementation actually seems worse in 4 out of 5 categories, if I"m reading this correctly.
Of these metrics I would say the that firstPaint
and Load
are the most important. The Nimbledroid numbers in the spreadsheet are measuring the progress bar which is a approximately equal to load time as well (but very noisy).
So it does look like a pretty reasonable win on reaching the load event, but yes, maybe some regression on other metrics too.
But I don't like how I only have one site measured accurately here so let me capture a broader sample.
I'll also profile this and see if I can get a better understanding.
Previously, if I'm not mistaken, the cert verification and intermediate storage ran completely independently of the main thread.
Perhaps even the small end-of-task main thread callback is delaying mainthread work?
Or perhaps accessing the cert db to save is delaying other threads that make use of it?
So it's less of a performance win because of this.
I'll see what I can find.
Reporter | ||
Comment 36•6 years ago
|
||
This is a profile of a heavy page load:
http://bit.ly/2TzO6wD
You can now see the new ImportInts threads.
The first one runs its background task for a bit over 1800ms - it looks like mostly flushing sqlite file buffers.
Perhaps this is simply IO contention?
Comment 37•6 years ago
|
||
Comment 38•6 years ago
|
||
bugherder |
Reporter | ||
Comment 39•6 years ago
|
||
I've collected more results on the Acer reference laptop.
Below is a set of runs with the baseline build (prior to your patch, Dana), two sets of runs with your patch, and one set with the saving of intermediates commented out.
These are live sites and there is noise but I think we can see significant improvements across all metrics in at least some of the sites when using the background thread CryptoTask. (lower is better for SpeedIndex).
Not saving the intermediates at all does lead to a faster pageload (you can see and example of the number of short-lived ImportInts threads here:
Baseline
wikipedia.org backEndTime: 786ms (±103.96ms), firstPaint: 2.04s (±180.65ms), DOMContentLoaded: 2.36s (±140.21ms), Load: 2.57s (±112.79ms), rumSpeedIndex: 935 (±101.89) (20 runs)
medium.com backEndTime: 1.26s (±800.13ms), firstPaint: 3.41s (±752.90ms), DOMContentLoaded: 3.70s (±805.01ms), Load: 5.71s (±780.66ms), rumSpeedIndex: 3321 (±770.55) (19 runs)
buzzfeed.com backEndTime: 711ms (±81.56ms), firstPaint: 865ms (±81.05ms), DOMContentLoaded: 868ms (±82.41ms), Load: 16.40s (±3.06s), rumSpeedIndex: 2953 (±253.85) (20 runs)
washingtonpost.com backEndTime: 1.10s (±98.54ms), firstPaint: 2.06s (±157.34ms), DOMContentLoaded: 2.52s (±99.65ms), Load: 15.75s (±2.55s), rumSpeedIndex: 2178 (±97.90) (20 runs)
CryptoTask
wikipedia.org backEndTime: 225ms (±43.73ms), firstPaint: 2.01s (±203.66ms), DOMContentLoaded: 2.31s (±152.89ms), Load: 2.64s (±89.29ms), rumSpeedIndex: 585 (±60.23) (20 runs)
medium.com backEndTime: 365ms (±18.34ms), firstPaint: 2.56s (±164.42ms), DOMContentLoaded: 2.66s (±128.13ms), Load: 4.88s (±217.35ms), rumSpeedIndex: 2453 (±143.53) (19 runs)
buzzfeed.com backEndTime: 303ms (±53.29ms), firstPaint: 596ms (±112.36ms), DOMContentLoaded: 439ms (±52.47ms), Load: 13.68s (±598.30ms),rumSpeedIndex: 2190 (±207.31) (20 runs)
washingtonpost.com backEndTime: 641ms (±99.23ms), firstPaint: 2.09s (±251.23ms), DOMContentLoaded: 2.59s (±190.91ms), Load: 13.76s (±733.88ms),rumSpeedIndex: 2021 (±143.04) (20 runs)
CryptoTask
wikipedia.org backEndTime: 263ms (±64.88ms), firstPaint: 2.30s (±237.86ms), DOMContentLoaded: 2.79s (±157.01ms), Load: 2.93s (±153.64ms), rumSpeedIndex: 637 (±97.83) (20 runs)
medium.com backEndTime: 406ms (±40.93ms), firstPaint: 2.49s (±188.87ms), DOMContentLoaded: 2.83s (±148.95ms), Load: 5.41s (±197.38ms), rumSpeedIndex: 2446 (±145.77) (20 runs)
buzzfeed.com backEndTime: 233ms (±5.69ms), firstPaint: 680ms (±158.71ms), DOMContentLoaded: 385ms (±8.95ms), Load: 13.77s (±819.03ms),rumSpeedIndex: 2848 (±251.80) (20 runs)
washingtonpost.com backEndTime: 625ms (±98.23ms), firstPaint: 2.04s (±190.58ms), DOMContentLoaded: 2.43s (±96.09ms), Load: 13.38s (±623.44ms),rumSpeedIndex: 1926 (±86.66) (20 runs)
Not saving Intermediates:
wikipedia.org backEndTime: 227ms (±55.06ms), firstPaint: 1.77s (±256.92ms), DOMContentLoaded: 2.09s (±229.80ms), Load: 2.55s (±147.25ms), rumSpeedIndex: 634 (±107.49) (20 runs)
medium.com backEndTime: 346ms (±13.07ms), firstPaint: 2.55s (±155.76ms), DOMContentLoaded: 2.63s (±131.54ms), Load: 4.37s (±188.01ms), rumSpeedIndex: 2371 (±131.11) (20 runs)
buzzfeed.com backEndTime: 245ms (±12.14ms), firstPaint: 393ms (±13.90ms), DOMContentLoaded: 389ms (±13.71ms), Load: 9.84s (±1.06s), rumSpeedIndex: 1450 (±98.50) (20 runs)
washingtonpost.com backEndTime: 642ms (±90.19ms), firstPaint: 2.08s (±216.39ms), DOMContentLoaded: 2.62s (±145.49ms), Load: 6.78s (±481.46ms), rumSpeedIndex: 2532 (±130.32) (19 runs)
Reporter | ||
Comment 40•6 years ago
|
||
I put together a test patch where the import is done with an IdleRunnable
so, if I've coded this correctly, it should only import if the event queue is empty of priority tasks.
I'll collect some numbers and see if there's any potential in this extension.
Comment 41•6 years ago
|
||
Setting 67=wontfix because I assume we don't need to uplift this fix to 67 Beta.
Assignee | ||
Comment 42•6 years ago
|
||
I thought we wanted this in 67 for Fenix?
Comment 43•6 years ago
|
||
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #42)
I thought we wanted this in 67 for Fenix?
We're now debating a new proposal to ship GV 68 (Beta) in Fenix MVP instead of 67. We wouldn't need to uplift as many big changes to 67 Beta if Fenix is using GV Nightly for another six weeks.
That said, using 68 is not a done deal, so feel free to uplift to 67 Beta now if you'd like to get more beta test time. There's only one Fennec/GV beta build each week (on Mondays), so you will need to request uplifts by Friday morning to give Relman time to review and process uplift requests.
Assignee | ||
Comment 44•6 years ago
|
||
(In reply to Andrew Creskey from comment #40)
I put together a test patch where the import is done with an
IdleRunnable
so, if I've coded this correctly, it should only import if the event queue is empty of priority tasks.
I'll collect some numbers and see if there's any potential in this extension.
Did this turn out to be beneficial?
Reporter | ||
Comment 45•6 years ago
|
||
The browsertime results were still very noisy so I'm re-running them now with these changes:
• number of page loads increased from 20 to 30
• number of sites increased from 4 to 6
• browser stabilization period of 30 seconds added (following raptor tp6)
As a consequence, it's taking a long time to execute...
Giving this a bit more thought, I'm quite concerned that I'll be regressing TTFI since the big import tasks are ending up on the main thread.
We're still not quite there with automated testing for all of this, so something to be careful about, I think.
Should have the results soon though. From profiling was interesting to see all the ImportInts show up interspersed in the main thread.
Reporter | ||
Comment 46•6 years ago
|
||
That's the patch by the way. I'm new to the threading and scheduling here so it's likely there's a more efficient way of doing this. Specifically, do I really need the overhead of creating threads just to run a task on the main?
Assignee | ||
Comment 47•6 years ago
|
||
Reporter | ||
Comment 48•6 years ago
|
||
Thanks Dana, let me look into those optimizations.
jesup: was looking at the profiles and suggested that I try the LazyIdleThread
where I can push all the imports to a single thread and avoid both the tread creation overhead and the threads colliding over access to the sql database.
Perhaps dispatching to the current thread queue as you suggested above may accomplish that as well. I'll investigate.
Reporter | ||
Comment 49•6 years ago
|
||
These are the best performance numbers I have yet:
I think that they show:
-Some big gains with the landed CryptoTask solution (e.g. 18.45% improvement in mean load time of washingtonpost url)
-Some cases where performance with CryptoTasks is decreased somewhat, but within the std dev
-The Import-on-Idle that I implemented improves some scenarios significantly but also regresses others
-Not importing is the fastest solution (obviously expected)
Assignee | ||
Comment 50•6 years ago
|
||
How about something like this: https://hg.mozilla.org/try/rev/71f3fe15fde857e2cf1e369f0080e17413494a80
Reporter | ||
Comment 51•6 years ago
|
||
Thanks Dana -- I think that conceptually that's very similar to what I tested last night:
https://hg.mozilla.org/try/rev/c762fd4e57ce98204580a9233c4fb1c09e62b12d
The import work is pushed onto the current thread (one of the SSL Cert threads) with a small main-thread interaction for the observer service.
I would say that your NS_NewRunnableFunction
accomplishes this with less code though.
These results compare the new baseline (CryptoTask threads) vs that revision of mine:
https://paste.rs/itd
Looking promising. I want to get results from more sites, so I'll expand this with your patch.
My one concern with this general approach is that we could be slowing down SSL Cert verification due to the large number of import tasks that can build up.
I'm getting closer to having a solution where we just have one dedicated thread that can do the import tasks, that might be best.
Reporter | ||
Comment 52•6 years ago
|
||
This is what I have for executing all of the import tasks on a single thread (SharedThreadPool with n=1)"
https://hg.mozilla.org/try/rev/5fc8cfeecbd4d4a6846fcdaa36a43125214cb68e
You're importCertsRunnable
came in handy Dana :)
Perhaps this can be accomplished without the threadpool? i.e. just a single dedicated thread with a timeout.
I looked into LazyIdleThread but it can only be used from the thread that created it, and there are up to 5 SSL Cert threads.
I'll look into this while I get performance numbers coming back from these changes.
Comment 53•6 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #43)
(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #42)
I thought we wanted this in 67 for Fenix?
We're now debating a new proposal to ship GV 68 (Beta) in Fenix MVP instead of 67. We wouldn't need to uplift as many big changes to 67 Beta if Fenix is using GV Nightly for another six weeks.
That said, using 68 is not a done deal, so feel free to uplift to 67 Beta now if you'd like to get more beta test time. There's only one Fennec/GV beta build each week (on Mondays), so you will need to request uplifts by Friday morning to give Relman time to review and process uplift requests.
FYI: Fenix MVP will use GeckoView 68, not 67, so we don't need to uplift this fix to 67 Beta unless you want it in Fennec 67.
Reporter | ||
Comment 54•6 years ago
|
||
These are the results from yesterday's run: comparing our new baseline (landed Cryptotask solution, Dana's NS_NewRunnableFunction
from Comment 50, and my Import ThreadPool from Comment 52
Some of these are high-noise sites, but although I see improvements some of the regressions are concerning.
I profiled a run with the patch from Comment 52 and it looks like the import thread is spending almost all of it's time waiting for the lock in nssSession_EnterMonitor
so that's not ideal.
https://perfht.ml/2YMKn2p
Reporter | ||
Comment 55•6 years ago
|
||
I'm having a bit of a hard time finding a reliable way to measure the current alternatives.
It looks like I'll need to get WebPageReplay going to get deterministic pages.
But I found this to be interesting: a profile of importing the certs on the SSL Cert threads (Comment 50) on idle -- it's definitely doing what it's supposed to: you can see them at the 60 second mark:
https://perfht.ml/2YPaTII
Reporter | ||
Comment 56•6 years ago
|
||
I was able to use :nalexander's nascent Browsertime-for-firefox-on-android to kick off long running tests of these various alternatives on android.
Browsertime results: 30 cold page loads of each url with a conditioned profile
Moto G5, geckoview example, --enable-release
|Prior to Bug 1529044 | Dana's CryptoTask import | Idle runnable added to queue| Not saving |
|(import during cert | (new runnable per import) | of given SSL Cert Thread | intermediate certs |
| verification) | | (Comment 50) | |
pageLoadTime: mean stddev mean stddev gain mean stddev gain mean stddev gain
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com | 2968.00 (+- 938) | 2269.00 (+- 500) 23.55% | 1898.00 (+- 198) 36.05% | 1851.00 (+- 207) 37.63% |
www.instagram.com | 3968.00 (+- 208) | 3873.00 (+- 159) 2.39% | 3960.00 (+- 291) 0.20% | 3879.00 (+- 330) 2.24% |
www.booking.com | 4052.00 (+- 393) | 3958.00 (+- 246) 2.32% | 3970.00 (+- 300) 2.02% | 3950.00 (+- 254) 2.52% |
www.expedia.com | 10683.00 (+-1655) | 10557.00 (+-1988) 1.18% | 8673.00 (+- 215) 18.81% | 8797.00 (+- 818) 17.65% |
www.glassdoor.com | 16546.00 (+-1441) | 16676.00 (+-1541) -0.79% | 16548.00 (+-1619) -0.01% | 16365.00 (+-1685) 1.09% |
en.m.wikipedia.org | 1220.00 (+- 124) | 1197.00 (+- 110) 1.89% | 1243.00 (+- 131) -1.89% | 1202.00 (+- 140) 1.48% |
www.amazon.com | 3042.00 (+- 450) | 3053.00 (+- 569) -0.36% | 2715.00 (+- 348) 10.75% | 2546.00 (+- 289) 16.31% |
medium.com | 4506.00 (+- 872) | 4526.00 (+- 913) -0.44% | 3935.00 (+- 668) 12.67% | 3853.00 (+- 793) 14.49% |
---------------------------------------------------------------------------------------------------------------------------------------
responseStart:
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com | 304.00 (+- 98) | 245.00 (+- 17) 19.41% | 247.00 (+- 12) 18.75% | 250.00 (+- 14) 17.76% |
www.instagram.com | 607.00 (+- 90) | 519.00 (+- 51) 14.50% | 538.00 (+- 190) 11.37% | 566.00 (+- 204) 6.75% |
www.booking.com | 643.00 (+- 258) | 549.00 (+- 58) 14.62% | 542.00 (+- 65) 15.71% | 512.00 (+- 64) 20.37% |
www.expedia.com | 529.00 (+- 69) | 462.00 (+- 62) 12.67% | 472.00 (+- 56) 10.78% | 462.00 (+- 56) 12.67% |
www.glassdoor.com | 1265.00 (+- 150) | 1179.00 (+- 221) 6.80% | 1146.00 (+- 147) 9.41% | 1166.00 (+- 208) 7.83% |
en.m.wikipedia.org | 344.00 (+- 27) | 296.00 (+- 17) 13.95% | 304.00 (+- 44) 11.63% | 297.00 (+- 14) 13.66% |
www.amazon.com | 776.00 (+- 103) | 702.00 (+- 99) 9.54% | 677.00 (+- 214) 12.76% | 646.00 (+- 76) 16.75% |
medium.com | 529.00 (+- 220) | 444.00 (+- 26) 16.07% | 497.00 (+- 250) 6.05% | 494.00 (+- 218) 6.62% |
---------------------------------------------------------------------------------------------------------------------------------------
firstPaint:
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com | 1234.00 (+- 237) | 1136.00 (+- 239) 7.94% | 1164.00 (+- 260) 5.67% | 1077.00 (+- 252) 12.72% |
www.instagram.com | 761.00 (+- 98) | 683.00 (+- 44) 10.25% | 707.00 (+- 189) 7.10% | 727.00 (+- 210) 4.47% |
www.booking.com | 1964.00 (+- 293) | 1852.00 (+- 142) 5.70% | 1867.00 (+- 148) 4.94% | 1840.00 (+- 110) 6.31% |
www.expedia.com | 775.00 (+- 72) | 712.00 (+- 66) 8.13% | 720.00 (+- 55) 7.10% | 720.00 (+- 87) 7.10% |
www.glassdoor.com | 2780.00 (+- 548) | 2756.00 (+- 594) 0.86% | 2675.00 (+- 528) 3.78% | 2850.00 (+- 648) -2.52% |
en.m.wikipedia.org | 673.00 (+- 75) | 621.00 (+- 67) 7.73% | 648.00 (+- 83) 3.71% | 636.00 (+- 78) 5.50% |
www.amazon.com | 1790.00 (+- 360) | 1681.00 (+- 291) 6.09% | 1662.00 (+- 283) 7.15% | 1682.00 (+- 242) 6.03% |
medium.com | 1524.00 (+- 463) | 1543.00 (+- 460) -1.25% | 1491.00 (+- 299) 2.17% | 1447.00 (+- 247) 5.05% |
---------------------------------------------------------------------------------------------------------------------------------------
firstContentfulPaint:
---------------------------------------------------------------------------------------------------------------------------------------
edition.cnn.com | 1826.00 (+- 202) | 1777.00 (+- 189) 2.68% | 1771.00 (+- 196) 3.01% | 1708.00 (+- 134) 6.46% |
www.instagram.com | 879.00 (+- 96) | 798.00 (+- 43) 9.22% | 826.00 (+- 191) 6.03% | 839.00 (+- 210) 4.55% |
www.booking.com | 2216.00 (+- 319) | 2111.00 (+- 184) 4.74% | 2120.00 (+- 161) 4.33% | 2080.00 (+- 141) 6.14% |
www.expedia.com | 914.00 (+- 74) | 853.00 (+- 65) 6.67% | 863.00 (+- 53) 5.58% | 865.00 (+- 99) 5.36% |
www.glassdoor.com | 2995.00 (+- 574) | 2987.00 (+- 645) 0.27% | 2914.00 (+- 559) 2.70% | 3058.00 (+- 663) -2.10% |
en.m.wikipedia.org | 829.00 (+- 91) | 764.00 (+- 83) 7.84% | 801.00 (+- 99) 3.38% | 783.00 (+- 94) 5.55% |
www.amazon.com | 2030.00 (+- 383) | 1916.00 (+- 311) 5.62% | 1902.00 (+- 287) 6.31% | 1920.00 (+- 238) 5.42% |
medium.com | 1790.00 (+- 470) | 1809.00 (+- 467) -1.06% | 1751.00 (+- 305) 2.18% | 1724.00 (+- 265) 3.69% |
---------------------------------------------------------------------------------------------------------------------------------------
Full urls:
"https://edition.cnn.com/ampstories/us/still-missing-cassini-heres-what-else-is-out-there"
"https://www.instagram.com/justintimberlake/"
"https://www.booking.com/searchresults.en-gb.html?city=20088325"
"https://www.expedia.com/Hotel-Search?destination=New+York%2C+New+York&latLong=40.756680%2C-73.986470®ionId=178293&startDate=&endDate=&rooms=1&_xpid=11905%7C1&adults=2"
"https://www.glassdoor.com/Job/jobs.htm?suggestCount=0&suggestChosen=true&clickSource=searchBtn&typedKeyword=Mozilla&sc.keyword=Mozilla&locT=C&locId=1147431&jobType="
"https://en.m.wikipedia.org/wiki/Mozilla_Corporation"
"https://www.amazon.com/s?k=laptop"
"https://medium.com/s/coincidences-are-a-lie/could-america-have-also-been-the-birthplace-of-impressionism-cb3d31a2e22d"
The results from dispatching the NS_NewRunnableFunction
as an Idle task on the given SSL Cert thread (Comment 50) look great and match the pageload impact of not importing the certs.
I was able to verify the behavior with logging and sure enough the imports don't start until significantly after the page is fully loaded.
To me this looks like a winner.
Dana?
Assignee | ||
Comment 57•6 years ago
|
||
Sounds good - I'll get that patch reviewed and landed.
Assignee | ||
Comment 58•6 years ago
|
||
Previously this functionality created a CryptoTask to do this work, but that
would cause a new thread to be created for each list of intermediates. This was
slow both because of all of the threads and because they could be scheduled
while other work was happening. Moving these tasks to the low-priority event
queue for threads in the certificate verification thread pool means no new
threads are created and the work only happens when these threads are idle
anyway.
Updated•6 years ago
|
Comment 59•6 years ago
|
||
Comment 60•6 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•