XPIProvider.jsm should not initialize NSS during early startup

RESOLVED FIXED in Firefox 55

Status

()

Core
Security: PSM
P1
normal
RESOLVED FIXED
20 days ago
11 hours ago

People

(Reporter: florian, Assigned: mossop)

Tracking

(Blocks: 2 bugs)

unspecified
mozilla55
x86_64
Windows 10
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox55 fixed)

Details

(Whiteboard: [photon-performance][qf:p1])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

20 days ago
This is expensive as it does disk IO. See this profile https://perfht.ml/2pNFXJM where it blocked the main thread for 157ms on the quantum reference hardware.

Currently the cost of this is paid by XPIProvider.jsm.

Could this be done very early on a different thread?

Mossop, I see you added this global gCertDB variable, and from looking at bug 1244248 it seems we need to set this variable before any add-on can run code. Would it still be possible to make this a lazy getter, and ensure it's been called before the first add-on runs? Currently it's in the root scope and we pay the cost of the NSS init as soon as the XPIProvider.jsm module is imported.
Flags: needinfo?(dtownsend)

Updated

20 days ago
Flags: qe-verify?
Priority: -- → P2
(Reporter)

Updated

20 days ago
Blocks: 1356825
(Assignee)

Comment 1

19 days ago
(In reply to Florian Quèze [:florian] [:flo] from comment #0)
> This is expensive as it does disk IO. See this profile
> https://perfht.ml/2pNFXJM where it blocked the main thread for 157ms on the
> quantum reference hardware.
> 
> Currently the cost of this is paid by XPIProvider.jsm.
> 
> Could this be done very early on a different thread?
> 
> Mossop, I see you added this global gCertDB variable, and from looking at
> bug 1244248 it seems we need to set this variable before any add-on can run
> code. Would it still be possible to make this a lazy getter, and ensure it's
> been called before the first add-on runs? Currently it's in the root scope
> and we pay the cost of the NSS init as soon as the XPIProvider.jsm module is
> imported.

Since add-ons are loaded in the startup flow anyway making this a lazily load wouldn't really buy us anything. The real win is that in Firefox 57 once only webextensions are allowed we don't need bug 1244248 at all.
Flags: needinfo?(dtownsend)
(Reporter)

Comment 2

19 days ago
(In reply to Dave Townsend [:mossop] from comment #1)

> Since add-ons are loaded in the startup flow anyway making this a lazily
> load wouldn't really buy us anything.

I was thinking that if we start initializing NSS off main thread, requiring it for add-ons a few hundred ms later may mean it's already done initializing and we don't have to wait for it.
We don't actually need the cert DB until after startup unless we're installing new add-ons, so I don't see any reason not to make this a lazy getter.

keeler, would it be possible/make sense to initialize NSS on a background thread during early startup?
Flags: needinfo?(dkeeler)
(Reporter)

Comment 4

19 days ago
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #3)
> We don't actually need the cert DB until after startup unless we're
> installing new add-ons

See bug 1244248.
Eh. There are other ways an add-on can do this, anyway. But there are also other ways we can try to prevent it.
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #5)
> Eh. There are other ways an add-on can do this, anyway. But there are also
> other ways we can try to prevent it.

The easiest way at this point would probably be to make nsIX509CertDB a builtinclass. It wouldn't give us that much protection, but it would give us at least as much protection as we get from the current workaround.

I'd really rather we just fix bug 1244246.
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #3)
> keeler, would it be possible/make sense to initialize NSS on a background
> thread during early startup?

It's possible - it would just be a fair bit of work. What complicates this is that currently, NSS can only be initialized after we have a profile directory but it must be initialized before any code tries to do anything cryptographic (so, hashes, signature verification, TLS connections, etc.). I suspect that even if we did initialize NSS on a background thread after getting the profile directory, some code on the main thread would require it and we'd end up blocking anyway.

We have been exploring the possibility of initializing NSS in memory-only mode early in startup and then loading the user's certificate/key databases as-needed. This should both decrease startup time and allow code that doesn't rely on these databases to run without incurring these read times (so things like hashing and add-on signature verification shouldn't need the full initialization). Unfortunately there are still some architectural hurdles to this we haven't worked out. More information at https://wiki.mozilla.org/Security/CryptoEngineering/Platform_Use_of_NSS

Another potential solution would be to switch from the old database format to the new one (bug 783994). We think the new format is more efficient and this may decrease load times (see bug 1290639) (then again, I have anecdotal experience that indicates that the new format is actually slower in operation, so this may not be something we want to do).
Flags: needinfo?(dkeeler)
OK, so in that case, I think we should try removing the AOM's eager load and see how early NSS winds up being initialized in the normal case. If it's more than a couple dozen ms after profile initialization, it might still be worth doing.

I think it's already a couple dozen ms from profile initialization to XPIProvider being loaded, so we're probably already pretty close, and my near-term plan for bug 1359653 is to begin background IO that we need during startup while the main thread is blocked preparing off-thread script compiles. This would probably fit nicely into that slot.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 11

19 days ago
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

I can't get a useful startup profile on my fast machine, can you test this florian and see if it helps? This should move NSS initialisation out of XPIProvider for the most part.
Attachment #8865051 - Flags: feedback?(florian)

Updated

19 days ago
Assignee: nobody → dtownsend
Status: NEW → ASSIGNED
Priority: P2 → P1

Comment 12

18 days ago
mozreview-review
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

https://reviewboard.mozilla.org/r/136714/#review139822

::: commit-message-9d46c:4
(Diff revision 2)
> +Bug 1362364: Lazily load the certificate database into the add-ons manager. r?rhelmer
> +
> +The lazy loading is a little more complex because we want this to be a constant
> +in the scope so extension's can't trivially replace it. This also changes the

s/extension's/extensions/

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm:3318
(Diff revision 2)
>    /**
>     * Verifies that all installed add-ons are still correctly signed.
>     */
>    verifySignatures() {
> +    return new Promise((resolve, reject) => {
> -    XPIDatabase.getAddonList(a => true, (addons) => {
> +      XPIDatabase.getAddonList(a => true, (addons) => {

Instead of a Task wrapped in a Promise, could the callback to XPIDatabase.getAddonList be an async function and the yield below an await?
Attachment #8865051 - Flags: review?(rhelmer)
This lgtm aoverall, I'd like to take a little more time to look since I was away on Friday.

Comment 14

16 days ago
mozreview-review
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

https://reviewboard.mozilla.org/r/136714/#review140178
Attachment #8865051 - Flags: review+
Comment hidden (mozreview-request)
(Assignee)

Comment 16

13 days ago
mozreview-review-reply
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

https://reviewboard.mozilla.org/r/136714/#review139822

> Instead of a Task wrapped in a Promise, could the callback to XPIDatabase.getAddonList be an async function and the yield below an await?

We discussed this on IRC and concluded that the change wasn't necessary.

Comment 17

13 days ago
Pushed by dtownsend@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/508966268fab
Lazily load the certificate database into the add-ons manager. r=rhelmer
Backed out for linting failure (dead variable):

https://hg.mozilla.org/integration/autoland/rev/19aa5a4ceb3c739f460179913bd6309773f1f4e7

Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=98478875&repo=autoland

> /home/worker/checkouts/gecko/toolkit/mozapps/extensions/test/addons/test_cache_certdb/bootstrap.js:74:10 | 'install' is defined but never used. (no-unused-vars)
Flags: needinfo?(dtownsend)
(Reporter)

Comment 19

13 days ago
Here i(In reply to Florian Quèze [:florian] [:flo] from comment #0)
> This is expensive as it does disk IO. See this profile
> https://perfht.ml/2pNFXJM where it blocked the main thread for 157ms on the
> quantum reference hardware.
> 
> Currently the cost of this is paid by XPIProvider.jsm.
> 
> Could this be done very early on a different thread?

Here is a startup profile on the quantum reference hardware where initializing NSS was blocked for 29s on main thread IO: https://perf-html.io/public/42543cb2367170f1cbe8f1f276b74ebef79a15ac/calltree/?range=0.0000_112.1308&thread=0
(In reply to Florian Quèze [:florian] [:flo] from comment #19)
> Here i(In reply to Florian Quèze [:florian] [:flo] from comment #0)
> > This is expensive as it does disk IO. See this profile
> > https://perfht.ml/2pNFXJM where it blocked the main thread for 157ms on the
> > quantum reference hardware.
> > 
> > Currently the cost of this is paid by XPIProvider.jsm.
> > 
> > Could this be done very early on a different thread?
> 
> Here is a startup profile on the quantum reference hardware where
> initializing NSS was blocked for 29s on main thread IO:
> https://perf-html.io/public/42543cb2367170f1cbe8f1f276b74ebef79a15ac/
> calltree/?range=0.0000_112.1308&thread=0

Hm. Blocking tens or hundreds of milliseconds might be fairly easy to avoid by initializing early on a background thread. Avoiding blocking 29s would require a lot of work making sure nothing blocked on NSS being initialized... It would probably be doable for HTTPS requests, which are designed to be async, but we use NSS in a lot of places.
Whiteboard: [photon-performance][qf] → [photon-performance][qf:p1]
Comment hidden (mozreview-request)
Flags: qe-verify? → qe-verify-
Thanks for updating the patch Mossop! Does this need re-review? If so, the flag will need to be re-set on rhelmer (or whoever needs to review this).
(Assignee)

Comment 23

8 days ago
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

*sigh*
Flags: needinfo?(dtownsend)
Attachment #8865051 - Flags: review+ → review?(rhelmer)
Florian, did you manage to get data on how much later NSS is initialized after this patch? The change itself shouldn't gain us much on its own, but if it buys us enough time for async NSS initialization to be worthwhile, I can try working with keeler to make that happen.
Flags: needinfo?(florian)

Comment 25

8 days ago
mozreview-review
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

https://reviewboard.mozilla.org/r/136714/#review143242

::: toolkit/mozapps/extensions/internal/XPIProvider.jsm:3672
(Diff revision 4)
>    /**
>     * Verifies that all installed add-ons are still correctly signed.
>     */
> -  verifySignatures() {
> -    XPIDatabase.getAddonList(a => true, (addons) => {
> -      (async function() {
> +  async verifySignatures() {
> +    try {
> +      let addons = await XPIDatabase.getAddonList(a => true);

nice

::: toolkit/mozapps/extensions/internal/XPIProviderUtils.js:135
(Diff revision 4)
> -    try {
> +      try {
> -      aMethod(aObject, function(aResult) {
> +        aMethod(obj, resolve);
> -        asyncMap_gotValue(aIndex, aResult);
> -      });
> -    } catch (e) {
> +      } catch (e) {
> -      logger.warn("Async map function failed", e);
> +        logger.work("Async map function failed", e);

Should this be `s/work/warn/`?

Also seems appropriate to use `logger.error` here.

::: toolkit/mozapps/extensions/internal/XPIProviderUtils.js:136
(Diff revision 4)
> -      aMethod(aObject, function(aResult) {
> +        aMethod(obj, resolve);
> -        asyncMap_gotValue(aIndex, aResult);
> -      });
> -    } catch (e) {
> +      } catch (e) {
> -      logger.warn("Async map function failed", e);
> -      asyncMap_gotValue(aIndex, undefined);
> +        logger.work("Async map function failed", e);
> +        resolve(undefined);

Hm I don't understand this bit... why not `reject`, is it OK/expected that this fails?

::: toolkit/mozapps/extensions/internal/XPIProviderUtils.js:703
(Diff revision 4)
>    /**
>     * Asynchronously list all addons that match the filter function
>     * @param  aFilter
>     *         Function that takes an addon instance and returns
>     *         true if that addon should be included in the selected array
>     * @param  aCallback

This should be noted in the jsdoc that it's optional and will return a promise if no callback is provided
Attachment #8865051 - Flags: review?(rhelmer)
(Assignee)

Comment 26

7 days ago
mozreview-review-reply
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

https://reviewboard.mozilla.org/r/136714/#review143242

> Hm I don't understand this bit... why not `reject`, is it OK/expected that this fails?

Promise.all rejects as soon as any of the promises rejects so we'd never get any results if any of the callbacks failed. The old code just used a result of undefined for that case so I just mirrored the behaviour. Given that this function is only used once there might be even more cleanup wins to be had but I didn't want to start falling down a rabbit hole here.
Comment hidden (mozreview-request)

Comment 28

7 days ago
mozreview-review
Comment on attachment 8865051 [details]
Bug 1362364: Lazily load the certificate database into the add-ons manager.

https://reviewboard.mozilla.org/r/136714/#review143550
Attachment #8865051 - Flags: review?(rhelmer) → review+

Comment 29

7 days ago
Pushed by dtownsend@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/997a22336e37
Lazily load the certificate database into the add-ons manager. r=rhelmer
https://hg.mozilla.org/mozilla-central/rev/997a22336e37
Status: ASSIGNED → RESOLVED
Last Resolved: 7 days ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Iteration: --- → 55.6 - May 29
Depends on: 1365862
(Reporter)

Comment 31

6 days ago
(In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment #24)
> Florian, did you manage to get data on how much later NSS is initialized
> after this patch? The change itself shouldn't gain us much on its own, but
> if it buys us enough time for async NSS initialization to be worthwhile, I
> can try working with keeler to make that happen.

Here is a startup profile from a nightly with the fix: https://perfht.ml/2q0o2fp

NSS is initialized much later while already processing a xul document (so I assume while building the first browser window). And it's cheap, it doesn't appear to be blocked on any disk I/O. I don't know if this is because another thread touched these files somehow, or if I was just lucky in this specific profile (I'll report back if I see different behaviors in other cold startup profiles in the next few days).

Now EnsureNSSInitializedChromeOrContent() is called from nsUrlClassifierDBService::Init(). This is triggered by the code around http://searchfox.org/mozilla-central/rev/f55349994fdac101d121b11dac769f3f17fbec4b/netwerk/base/nsChannelClassifier.cpp#417
It's surprising though, because this function tries to return early for local files, and I really don't see what non-local file we could have been trying to load while creating a browser.xul window loading about:home. That seems like either we are loading something we shouldn't, or that function needs a tweak.
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] [:flo] from comment #31)
> NSS is initialized much later while already processing a xul document (so I
> assume while building the first browser window). And it's cheap, it doesn't
> appear to be blocked on any disk I/O. I don't know if this is because
> another thread touched these files somehow, or if I was just lucky in this
> specific profile (I'll report back if I see different behaviors in other
> cold startup profiles in the next few days).

There's probably a certain amount of luck involved, and I'd also expect the
disk to be less busy that late in startup than it generally is during early
startup.

Still, 19ms is a fairly big chunk of startup time. It would definitely be nice
if we could move that to a background thread. There should be plenty of time
for it to finish before it's necessary now.
(In reply to Florian Quèze [:florian] [:flo] from comment #31)
> NSS is initialized much later while already processing a xul document (so I
> assume while building the first browser window). And it's cheap, it doesn't
> appear to be blocked on any disk I/O. I don't know if this is because
> another thread touched these files somehow, or if I was just lucky in this
> specific profile (I'll report back if I see different behaviors in other
> cold startup profiles in the next few days).

You are being lucky here, I think.  This is still main-thread IO, and it could take an arbitrary amount of time.  I think Kris is right, we should still try to move this to a background thread.

> Now EnsureNSSInitializedChromeOrContent() is called from
> nsUrlClassifierDBService::Init(). This is triggered by the code around
> http://searchfox.org/mozilla-central/rev/
> f55349994fdac101d121b11dac769f3f17fbec4b/netwerk/base/nsChannelClassifier.
> cpp#417
> It's surprising though, because this function tries to return early for
> local files, and I really don't see what non-local file we could have been
> trying to load while creating a browser.xul window loading about:home. That
> seems like either we are loading something we shouldn't, or that function
> needs a tweak.

It looks like about: URIs aren't URI_IS_LOCAL_RESOURCE for example...
(Reporter)

Comment 34

6 days ago
(In reply to Florian Quèze [:florian] [:flo] from comment #31)
> (In reply to Kris Maglione [:kmag] (busy; behind on reviews) from comment
> #24)
> > Florian, did you manage to get data on how much later NSS is initialized
> > after this patch? The change itself shouldn't gain us much on its own, but
> > if it buys us enough time for async NSS initialization to be worthwhile, I
> > can try working with keeler to make that happen.
> 
> Here is a startup profile from a nightly with the fix:
> https://perfht.ml/2q0o2fp
> 
> NSS is initialized much later while already processing a xul document (so I
> assume while building the first browser window). And it's cheap, it doesn't
> appear to be blocked on any disk I/O. I don't know if this is because
> another thread touched these files somehow, or if I was just lucky in this
> specific profile (I'll report back if I see different behaviors in other
> cold startup profiles in the next few days).

It took 152ms in a cold startup profile I just saw, so I indeed just got lucky yesterday.
(Reporter)

Comment 35

6 days ago
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #33)

> > Now EnsureNSSInitializedChromeOrContent() is called from
> > nsUrlClassifierDBService::Init(). This is triggered by the code around
> > http://searchfox.org/mozilla-central/rev/
> > f55349994fdac101d121b11dac769f3f17fbec4b/netwerk/base/nsChannelClassifier.
> > cpp#417
> > It's surprising though, because this function tries to return early for
> > local files, and I really don't see what non-local file we could have been
> > trying to load while creating a browser.xul window loading about:home. That
> > seems like either we are loading something we shouldn't, or that function
> > needs a tweak.
> 
> It looks like about: URIs aren't URI_IS_LOCAL_RESOURCE for example...

I added some debugging output in that code, and on my macbook the NSS initialization is triggered by an about:blank load.

In total there are 5 loads that could trigger it before we first paint: 4 about:blank loads, and a http://detectportal.firefox.com/success.txt load.

I think we could change the code to return early for about:blank.
For the captive portal load, given that this is an async XHR and we can't expect to receive the result before we paint, I think we could as well start that load after we have painted. Or we could whitelist the domain using the urlclassifier.skipHostnames pref.

So it seems possible to push the nss initialization to after we first paint, and that would hopefully buy us enough time to finish doing it off main thread before we actually need it.
(In reply to Florian Quèze [:florian] [:flo] from comment #35)
> I think we could change the code to return early for about:blank.
> For the captive portal load, given that this is an async XHR and we can't
> expect to receive the result before we paint, I think we could as well start
> that load after we have painted. Or we could whitelist the domain using the
> urlclassifier.skipHostnames pref.
> 
> So it seems possible to push the nss initialization to after we first paint,
> and that would hopefully buy us enough time to finish doing it off main
> thread before we actually need it.

Those seem like good ideas either way, but I think we already have enough time to finish NSS initialization before it's needed in the vast majority of cases, at this point, so I don't think we need to block on them.
(Reporter)

Comment 37

5 days ago
(In reply to Florian Quèze [:florian] [:flo] from comment #35)

> For the captive portal load, given that this is an async XHR and we can't
> expect to receive the result before we paint, I think we could as well start
> that load after we have painted.

I discussed this with Nihanth who worked on captive portal. He said that ideally we would want that request to be started as early as possible so that the result is already available by the time we open the first browser window (before we attempt to display the user's home page). But this request shouldn't cause a main thread init of NSS, so I think we would want a promise that is resolved once necko is done initializing in the background and ready to handle async requests without causing unreasonable main-thread overhead. (This is already the second time we see the captive portal request forcing main thread initialization of necko stuff; bug 1360164 was the first one.)
(Reporter)

Updated

a day ago
Summary: NSS is initialized on the main thread during startup → XPIProvider.jsm should not initialize NSS during early startup
(Reporter)

Comment 38

a day ago
(In reply to Florian Quèze [:florian] [:flo] from comment #35)

> I think we could change the code to return early for about:blank.

Felipe took care of this in bug 1337043.
(Reporter)

Comment 39

11 hours ago
(In reply to Florian Quèze [:florian] [:flo] from comment #38)
> (In reply to Florian Quèze [:florian] [:flo] from comment #35)
> 
> > I think we could change the code to return early for about:blank.
> 
> Felipe took care of this in bug 1337043.

And now, NSS is initialized by the search service using Cc["@mozilla.org/security/hash;1"].createInstance(Ci.nsICryptoHash).
We are taking care of it in bug 1359031.
You need to log in before you can comment on or make changes to this bug.