Closed Bug 1606992 Opened 3 months ago Closed 3 months ago

Having a Master Password makes Firefox password manager (both Lockwise and legacy) slow to load (10s for 300 passwords on an i7), using 100% CPU and causing laptop fans to spin up

Categories

(NSS :: Libraries, defect, P1)

3.48
defect

Tracking

(firefox-esr68 unaffected, firefox72+ wontfix, firefox73+ verified, firefox74+ verified)

VERIFIED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox72 + wontfix
firefox73 + verified
firefox74 + verified

People

(Reporter: ronan, Assigned: KaiE)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, regression, Whiteboard: [qf:p3:responsiveness] [passwords:master-password])

Attachments

(4 files, 3 obsolete files)

Steps to reproduce

  1. Open about:logins

Actual results

  • Load takes 10s...
  • ... during which the main firefox-bin process keeps 1 CPU core (out of 4) 100% busy, making the laptop fans spin

Expected results

Before Lockwise, opening the password manager was instant and didn't make the fans spin.

Debug information

  • Firefox 73.0a1 - 20200103202240 (official binary build from mozilla.org)
  • Arch Linux 64bit
  • 304 passwords
  • Using a master password
  • Un/checking "Show alerts about passwords for breached websites" doesn't change anything
  • Passwords are not synced with Firefox Sync, they're local only
  • Feel free to ask for more info
Summary: Firefox "Lockwise" password manager is slow to load (10s for 300 passwords) and uses 100% CPU, causing laptop fans to spin upFirefox "Lockwise" password manager is slow to load (10s for 300 passwords) and uses 100% CPU, causing laptop fans to spin → Firefox "Lockwise" password manager is slow to load (10s for 300 passwords) and uses 100% CPU, causing laptop fans to spin up

Sorry to hear that.

  • Can you see how long it takes to load chrome://passwordmgr/content/passwordManager.xhtml in the same Firefox version?
  • When did you enable a master password? We recently increased the iterations of it.
  • Can you share a profile from https://profiler.firefox.com/ of you opening about:logins? I think the parent/main process would be the one that's relevant.

Thank you

Flags: needinfo?(ronan)

(In reply to Matthew N. [:MattN] (PM me if request are blocking you) from comment #1)

Can you see how long it takes to load chrome://passwordmgr/content/passwordManager.xhtml in the same Firefox version?

It takes just as long. Interesting, see second comment.

When did you enable a master password? We recently increased the iterations of it.

Good catch! Yes, the master password is brand new from last week. Last week, reading about yet another breach, I decided it was time to secure a few reused passwords, replacing them with unique ones. So, for convenience while I was editing stuff in Lockwise and avoid being prompted constantly for the master password, I disabled the master password and re-enabled it after I was done (using the Firefox version mentioned above, 73.0a1), getting me more iterations. This bug was reported afterwards.

And I just tried disabling the master password; I confirm it makes for an instant load of about:logins.

So yup, I amalgamated the impact of the new master password with the impact of Lockwise; thanks for catching that 🙂 and feel free to rename the bug. Still, 10s to load my 300 passwords (on a reasonably recent laptop with an Intel i7) seems like a long time to me. Is this roughly aligned with measurements on your own {machine, passCount} samples? If yes, do you yourself find this reasonable? Hope you can find some room for optimization.

Can you share a profile from https://profiler.firefox.com/ of you opening about:logins? I think the parent/main process would be the one that's relevant.

Here you go: https://perfht.ml/37V8Tm3 . I checked GeckoMain only and, at Publish time, unchecked everything in "Include additional data that may be identifiable". Ping me if you need more.

Flags: needinfo?(ronan) → needinfo?(MattN+bmo)

same behaviout here, Lockwise behaves remarkable slower even without master password. See https://perfht.ml/2N7Ch0o just opening about:logins with ~1.300 items (yes, it's a very old profile and I'm a heavy user).

Summary: Firefox "Lockwise" password manager is slow to load (10s for 300 passwords) and uses 100% CPU, causing laptop fans to spin up → Having a Master Password makes Firefox password manager (both Lockwise and legacy) slow to load (10s for 300 passwords on an i7), using 100% CPU and causing laptop fans to spin up

(In reply to Eckard from comment #3)

same behavior here, Lockwise behaves remarkable slower even without master password. See https://perfht.ml/2N7Ch0o just opening about:logins with ~1.300 items (yes, it's a very old profile and I'm a heavy user).

That's a different bug from mine, then. The original bug description is incorrect, see Comment 2 for the debunked problem, and I just edited the title to reflect the case that I can answer about. My problem is that the Master Password causes slowness; I'm personally fine with Lockwise (about:logins) speed.

@Eckard, please create a new bug about Lockwise being slower than chrome://passwordmgr/content/passwordManager.xhtml.

(In reply to Ronan Jouchet from comment #4)

That's a different bug from mine, then. The original bug description is incorrect, see Comment 2 for the debunked problem, and I just edited the title to reflect the case that I can answer about. My problem is that the Master Password causes slowness; I'm personally fine with Lockwise (about:logins) speed.

Confirmed, about:logins is about ~3 seconds without master password and ~10 seconds with master password just like you reported (https://perfht.ml/2QCaLdv). Will wait for performance boosts from this bug before raising an extra topic.

[Tracking Requested - why for this release]: Major performance regression for about:logins with master password on Linux.

(In reply to Ronan Jouchet from comment #2)

(In reply to Matthew N. [:MattN] (PM me if request are blocking you) from comment #1)

When did you enable a master password? We recently increased the iterations of it.

Good catch! Yes, the master password is brand new from last week. Last week, reading about yet another breach, I decided it was time to secure a few reused passwords, replacing them with unique ones. So, for convenience while I was editing stuff in Lockwise and avoid being prompted constantly for the master password, I disabled the master password and re-enabled it after I was done (using the Firefox version mentioned above, 73.0a1), getting me more iterations. This bug was reported afterwards.

The issue is likely caused by bug 1562671 then.

And I just tried disabling the master password; I confirm it makes for an instant load of about:logins.

So yup, I amalgamated the impact of the new master password with the impact of Lockwise; thanks for catching that 🙂 and feel free to rename the bug. Still, 10s to load my 300 passwords (on a reasonably recent laptop with an Intel i7) seems like a long time to me. Is this roughly aligned with measurements on your own {machine, passCount} samples? If yes, do you yourself find this reasonable? Hope you can find some room for optimization.

No, for me I have 800+ logins and about:logins loads in less than 1s on macOS. I don't think 10s is at all reasonable.

Can you share a profile from https://profiler.firefox.com/ of you opening about:logins? I think the parent/main process would be the one that's relevant.

Here you go: https://perfht.ml/37V8Tm3 . I checked GeckoMain only and, at Publish time, unchecked everything in "Include additional data that may be identifiable". Ping me if you need more.

Thanks. Are you sure this profile was for the 10s case? The timeline shows at most a 2s delay in the parent process.

Blocks: 1562674
Component: about:logins → Security: PSM
Flags: needinfo?(MattN+bmo) → needinfo?(ronan)
Keywords: perf, regression
Product: Firefox → Core
Regressed by: 1562671, 1592007
Whiteboard: [qf] → [qf] [passwords:master-password]

I would expect export of thousands of logins to take this amount of effort, but simply displaying the records shouldn't require decryption of their secrets. It wouldn't surprise me if there's no interface to get the site metadata without actually decrypting the entries in the database, however. Is that the situation?

(In reply to Matthew N. [:MattN] (PM me if request are blocking you) from comment #6)

Here you go: https://perfht.ml/37V8Tm3 . I checked GeckoMain only and, at Publish time, unchecked everything in "Include additional data that may be identifiable". Ping me if you need more.

Thanks. Are you sure this profile was for the 10s case? The timeline shows at most a 2s delay in the parent process.

Nevermind that. I was confused because the piece I was looking for uses a Promise so doesn't show up as a long-running function.

KaiE, can you take a look at this? I'm not seeing this slow down attributed to NSS/softtoken in the profiles but maybe there is a separate process or thread involved? Or maybe the profiler misses it if system NSS is being used? Is there any way to optimize this or will we need to lower the iterations?

Flags: needinfo?(ronan) → needinfo?(kaie)

(In reply to J.C. Jones [:jcj] (he/him) from comment #7)

I would expect export of thousands of logins to take this amount of effort, but simply displaying the records shouldn't require decryption of their secrets.

I'm on a 2013 MBP with 800+ logins and it's less than 1s though, why such the big difference? The iteration change didn't affect the encryption of the individual passwords, only the encryption of the encryption key IIUC and I wouldn't expect the decryption of the key to scale with the number of decryption calls using the key, doesn't the key stay in memory as long as the MP stays unlocked?

It wouldn't surprise me if there's no interface to get the site metadata without actually decrypting the entries in the database, however. Is that the situation?

We encrypt both the username and password and the username is needed for display in the login list. It's a known issue that there isn't a way to retrieve the login without decrypting the password but I don't think that would solve this issue as it would only halve the time required.

If you also include the "StreamTrans" thread, we might get more information about what exactly is slow.

Flags: needinfo?(ronan)

Matthew: Simply upgrading Firefox doesn't change the iteration count. Did you set it in the past, prior to the recent code change? Then your database still uses an iteration count of 1. Only if you go through the change/set MP dialog with the new code, will a higher iteration count be used.

Flags: needinfo?(kaie)

We should try to reproduce, without requiring anyone to share their saved passwords.
I guess we already have some automated test for stored passwords. We should tweak it locally to add a loop to create 1000 random entries. Then take that key database and logins file, copy it to a test profile, and use it for interactive testing, or we could share those files here.

(In reply to Kai Engert (:KaiE:) from comment #11)

Matthew: Simply upgrading Firefox doesn't change the iteration count.

I know that which is why I asked the reporter when they enabled MP.

Did you set it in the past, prior to the recent code change? Then your database still uses an iteration count of 1. Only if you go through the change/set MP dialog with the new code, will a higher iteration count be used.

Please re-read comment 2:

(Quoting Ronan Jouchet from comment #2)

(In reply to Matthew N. [:MattN] (PM me if request are blocking you) from comment #1)

When did you enable a master password? We recently increased the iterations of it.

Good catch! Yes, the master password is brand new from last week. Last week, reading about yet another breach, I decided it was time to secure a few reused passwords, replacing them with unique ones. So, for convenience while I was editing stuff in Lockwise and avoid being prompted constantly for the master password, I disabled the master password and re-enabled it after I was done (using the Firefox version mentioned above, 73.0a1), getting me more iterations. This bug was reported afterwards.

(In reply to Kai Engert (:KaiE:) from comment #12)

We should try to reproduce, without requiring anyone to share their saved passwords.
I guess we already have some automated test for stored passwords. We should tweak it locally to add a loop to create 1000 random entries. Then take that key database and logins file, copy it to a test profile, and use it for interactive testing, or we could share those files here.

You can easily import 1000 test logins with the following code in the Browser Console:

var logins = [];
for (let i = 0; i < 1000; i++) {
  logins.push({origin: `https://${i}.example.com`, username: `user${i}`, password: `password${i}`});
}
await LoginHelper.maybeImportLogins(logins);

Thanks. I had to execute the following before it worked:

const { LoginHelper } = ChromeUtils.import(
  "resource://gre/modules/LoginHelper.jsm"
);

(In reply to Matthew N. [:MattN] (PM me if request are blocking you) from comment #9)

(In reply to J.C. Jones [:jcj] (he/him) from comment #7)

I would expect export of thousands of logins to take this amount of effort, but simply displaying the records shouldn't require decryption of their secrets.

I'm on a 2013 MBP with 800+ logins and it's less than 1s though, why such the big difference?

Oops, I thought I changed my MP on the first Nightly after this landed but I guess I was too early as I can now reproduce the issue of multiple seconds after changing it again… sorry about that.

(In reply to Kai Engert (:KaiE:) from comment #11)

Matthew: Simply upgrading Firefox doesn't change the iteration count.

Sorry about that, I didn't realize you were telling me that I may still be using the old iteration count, I thought you were suggesting that the reporter was…

No worries.

I've added some poor man's logging, that logs the number of times we go through SecretDecoderRing::DecryptString and the total amount of time we need for the contained call to Decrypt(), with fprintf to stderr.

With 100 passwords, if I open firefox prefs / saved logins, I see we call DecryptString 200 times.
Without MP, that takes a total of just 34 ms.

Set MP, restarted. I need to remove the time for first decryption, because that includes the duration for entering the MP in UI. In total, 2700ms.
So, with MP set, each decrypt is slower by a factor of 80. The time for each decrypt seems roughly constant.

I'll try to find out why.

Note that during an earlier test, I decided I want more passwords. I selected all 1000 passwords in the UI and hit "remove". That wasn't a good idea... After 650000 calls to DecryptString it was still busy... Probably the UI is implemented in a way that completely rebuilds the whole list after each single removal, requiring all decryptions to be done again after each single removal...

Reducing the master password iteration count from the current 10k to 1k reduced the time for 200 decryptions from 2700ms to roughly 300ms.

This suggests that the MP iteration count directly influences the time we need for decryption. It sounds like NSS doesn't keep the unwrapped symmetric key in memory. Rather, it probably keeps it wrapped in memory, and also remembers the master password, and needs to unwrap it every time for decryption (my guess). Let's ask Bob.

Bob, we see that the increased iteration count has a direct influence on the time it takes to perform secret decoder ring decryption. There seems to be a proportional relationship, between the iteration count and the time required for decrypt.

Does NSS remember the wrapped symmetric key in memory, und does NSS need to unwrap it each time we decrypt? If yes, is that by design, or could that get optimized?

Flags: needinfo?(rrelyea)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #10)

If you also include the "StreamTrans" thread, we might get more information about what exactly is slow.

Reading the recent discussions, I understand you reproduced & understand the issue, and you no longer need this new profile. Clearing my needinfo, go and re-add it if I missed something and you'd still like a new one.

Flags: needinfo?(ronan)

Here is a profile on macOS including StreamTrans with 1000 logins. Most of the time is spent in nsspkcs5_ComputeKeyAndIV: https://perfht.ml/2t3dYd2

Note that we are using AsyncDecryptStrings to do batch decryption so at least for that case I would think the key could be kept in memory but it looks like that just ends up calling sdrService->DecryptString for each string on a background thread.

For 73 and possibly a 72 dot release, can we easily set the iteration count back to 1 (or whatever it was before)?

(In reply to Julien Cristau [:jcristau] from comment #22)

For 73 and possibly a 72 dot release, can we easily set the iteration count back to 1 (or whatever it was before)?

That alone wouldn't help users who already changed their password. They'd also have to change their master password again, after having updated. Also, that change is inside NSS, so we'd have to release an upgraded NSS version, too, to allow users of Linux distributions to get the fix.

(In reply to Kai Engert (:KaiE:) from comment #23)

(In reply to Julien Cristau [:jcristau] from comment #22)

For 73 and possibly a 72 dot release, can we easily set the iteration count back to 1 (or whatever it was before)?

That alone wouldn't help users who already changed their password. They'd also have to change their master password again, after having updated.

Right, I figured that. But it'd help anyone setting or changing their master password after that point.

Also, that change is inside NSS, so we'd have to release an upgraded NSS version, too, to allow users of Linux distributions to get the fix.

I wasn't sure if there was an API for the NSS consumer to control this. Seems it's just an env var. Unfortunate.

Firefox could easily set the MAX environment variable, in security/manager, prior to calling NSS init, then it wouldn't have to change NSS. (And it could omit changing the environment variable, if the variable has already been set outside of Firefox.)

(In reply to Matthew N. [:MattN] (PM me if request are blocking you) from comment #21)

Here is a profile on macOS including StreamTrans with 1000 logins. Most of the time is spent in nsspkcs5_ComputeKeyAndIV: https://perfht.ml/2t3dYd2

Thank you, that's very helpful.

Note that we are using AsyncDecryptStrings to do batch decryption so at least for that case I would think the key could be kept in memory but it looks like that just ends up calling sdrService->DecryptString for each string on a background thread.

Yeah, it's NSS which doesn't optimize. I'm currently investigating if I can identify a quick optimization.

Indeed, every time NSS decrypts a stored string, it will run through the number of iterations of the master password to create some hash value.

I see the following potential solutions to this bug.

(a) Enhance NSS to cache the most recently calculated password hash for PBKDF2. Reuse the pre-calculated hash if all parameters of the new operation match that of the cached object.

I'm attaching a patch that illustrates it. In my testing, it speeds it up significantly, loading 100 passwords takes 30 milliseconds. Right now this is a trivial approach that only adds a cache for the one mechanism we use for the master password.

The advantage of this approach is that it doesn't require any new API. If other NSS developers accept this patch, it might be sufficient to simply clean this up and add a local mutex (short lived, only to protect that cache data).

(b) Add a new API PK11SDR_DecryptMany which takes a list of items to decrypt.

Attached patch 1606992-wip1.patch (obsolete) — Splinter Review
Assignee: nobody → kaie
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached patch 1606992-hotfix-v1.patch (obsolete) — Splinter Review

This is a potential hotfix. Removed the logging, added a lock for thread safety.

If other NSS developers call it ugly, hopefully it can be at least be a base for discussion. Is it good enough to ship it as a temporary solution on the stable branch?

Attachment #9120083 - Attachment is obsolete: true
Comment on attachment 9120086 [details] [diff] [review]
1606992-hotfix-v1.patch

Review of attachment 9120086 [details] [diff] [review]:
-----------------------------------------------------------------

I think that optimally NSS would provide a handle that could be re-used for subsequent decrypts, but that's too significant to consider at this time. I like the idea of adding this cache better than adding a new function in a hotfix, but in either case we should open a follow-on for whatvever function mechanism we deem best, whether a handle mechanism or a vector of many items to decrypt.

::: security/nss/lib/softoken/lowpbe.c
@@ +556,5 @@
>  
>      return A;
>  }
>  
> +static PZLock *PBE_cache_lock = NULL;

Let's add a comment here pointing to this bug, something like
/* Bug 1606992 - Cache the hash result for the common case that we're asked to compute the key for the same item and IV.  */

@@ +619,5 @@
> +            if (cached_PBKDF2_item) {
> +                if (pbe_param->hashType == cached_hashType &&
> +                    pbe_param->iter == cached_iterations &&
> +                    pbe_param->keyLen == cached_keyLen &&
> +                    SECITEM_ItemsAreEqual(&pbe_param->salt, cached_salt) &&

we need to check cached_salt and cached_pwitem to be not-null before calling ItemsAreEqual, as that function is not null-safe.
Status: NEW → ASSIGNED
Component: Security: PSM → Libraries
OS: Unspecified → All
Priority: -- → P1
Product: Core → NSS
QA Contact: jjones
Hardware: Unspecified → All
Target Milestone: --- → 3.48
Version: 73 Branch → 3.48

(In reply to J.C. Jones [:jcj] (he/him) from comment #30)

I think that optimally NSS would provide a handle that could be re-used for
subsequent decrypts, but that's too significant to consider at this time. I
like the idea of adding this cache better than adding a new function in a
hotfix, but in either case we should open a follow-on for whatvever function
mechanism we deem best, whether a handle mechanism or a vector of many items
to decrypt.

Agreed to have a follow up to improve. Would like to hear more thoughts on which approach is best.

Let's add a comment here pointing to this bug, something like
/* Bug 1606992 - Cache the hash result for the common case that we're asked
to compute the key for the same item and IV. */

Will add a tweaked comment. Note the current code adds a cache for one PBE type, only. And the cached item is independent of IV for PBKDF2, so it works with different IVs, IIUC.

  •        if (cached_PBKDF2_item) {
    
  •            if (pbe_param->hashType == cached_hashType &&
    
  •                pbe_param->iter == cached_iterations &&
    
  •                pbe_param->keyLen == cached_keyLen &&
    
  •                SECITEM_ItemsAreEqual(&pbe_param->salt, cached_salt) &&
    

we need to check cached_salt and cached_pwitem to be not-null before calling
ItemsAreEqual, as that function is not null-safe.

thanks, will adjust.

Comment on attachment 9120194 [details] [diff] [review]
1606992-hotfix-v2.patch

Review of attachment 9120194 [details] [diff] [review]:
-----------------------------------------------------------------

This looks fine to me, r+, but I would like to get Bob or Martin's take before landing; I don't want to live with this longer than we have to, so a triple-check that we aren't painting ourselves into some circle I don't see ... yeah.
Attachment #9120194 - Flags: review?(jjones) → review+

At the time we call sftk_PBELockShutdown() we should probably destroy the cache entry.

Attachment #9120194 - Attachment is obsolete: true
Attachment #9120194 - Flags: feedback?(rrelyea)
Attachment #9120204 - Flags: review?(jjones)
Attachment #9120204 - Flags: feedback?(rrelyea)
Comment on attachment 9120204 [details] [diff] [review]
1606992-hotfix-v3.patch

Review of attachment 9120204 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, bad me.
Attachment #9120204 - Flags: review?(jjones) → review+
Attachment #9120204 - Flags: review?(mt)

Ah... So we cache a PBE, but each record is encrypted with it's own key, generated by the cached PBE that we saved. I think Kai's approach is probably the best. This retains each entry have it's own unique strong key, while only generating that key once. When there isn't a master password, each entry has a smaller iteration count because if there isn't a master password, the entry is effectively obscured, not encrypted.

This issue would have been worse with the AES CBC addition because entries are not only encrypted, but signed (with yet another key). Kai's patch will cache both keys since it at the pbe level.

bob

Flags: needinfo?(rrelyea)

Would this also optimize for legacy (dbm) usage or does that need something else? I poked at this optimization but for my use there doesn't seem to be a difference in speed.

Comment on attachment 9120204 [details] [diff] [review]
1606992-hotfix-v3.patch

With Bob's feedback I think we can go ahead, commit and test this fix. Martin's feedback would still be welcome!
I'll start a firefox try build.
Attachment #9120204 - Flags: review?(mt)
Attachment #9120204 - Flags: feedback?(rrelyea)
Attachment #9120204 - Flags: feedback?(mt)
Blocks: 1608630
Comment on attachment 9120204 [details] [diff] [review]
1606992-hotfix-v3.patch

Review of attachment 9120204 [details] [diff] [review]:
-----------------------------------------------------------------

This looks a little hacky, but it should work.

Is phabricator down?

::: lib/softoken/lowpbe.c
@@ +565,5 @@
> +static HASH_HashType cached_hashType;
> +static int cached_iterations;
> +static int cached_keyLen;
> +static SECItem *cached_salt = NULL;
> +static SECItem *cached_pwitem = NULL;

This should be rolled up into a(n anonymous) struct.

@@ +570,5 @@
> +
> +void
> +sftk_PBELockInit(void)
> +{
> +    if (!PBE_cache_lock) {

Can't you just assert that the lock is null?

@@ +645,5 @@
>              }
>  
>              break;
>          case NSSPKCS5_PBKDF2:
> +            PZ_Lock(PBE_cache_lock);

Move the getter and setter parts into functions.
Comment on attachment 9120204 [details] [diff] [review]
1606992-hotfix-v3.patch

Review of attachment 9120204 [details] [diff] [review]:
-----------------------------------------------------------------

This looks a little hacky, but it should work.

Is phabricator down?

::: lib/softoken/lowpbe.c
@@ +565,5 @@
> +static HASH_HashType cached_hashType;
> +static int cached_iterations;
> +static int cached_keyLen;
> +static SECItem *cached_salt = NULL;
> +static SECItem *cached_pwitem = NULL;

This should be rolled up into a(n anonymous) struct.

@@ +570,5 @@
> +
> +void
> +sftk_PBELockInit(void)
> +{
> +    if (!PBE_cache_lock) {

Can't you just assert that the lock is null?

@@ +645,5 @@
>              }
>  
>              break;
>          case NSSPKCS5_PBKDF2:
> +            PZ_Lock(PBE_cache_lock);

Move the getter and setter parts into functions.
Attachment #9120204 - Flags: feedback?(mt)

FF try build for commited patch LGTM.

I've attached an incremental patch based on Martin's review, but that's just code cleanup.

IMHO the branch uplifts could be done with or without that cleanup. If you're in a hurry and want to get this out ASAP, IMHO you don't need to wait for the cleanup code to land on trunk.

Clarification: Uplift to Firefox branches requires NSS releases. J.C. had filed bugs for creating those releases and assigned them to himself, so I assume he'll drive that. (Let me know if you need me to help with that.)

Note: I do want to take Kai's cleanup patch into default, but I'm doing the "non-cleaned-up" patch into the release branches.

RE Marks question about dbm. There shouldn't be a dbm issue since we still use an iteration count of 1 for dbm (dbm currently can't store the iteration count). If there is some other slow down (like the dbm reference itself is slow), then this won't have any affect because we are caching the result of the pbe hash generation, not the entry or the decrypted data.

Robert:
I'm using the method implemented in 3.48 to store iteration counts for DBM and using a higher-than-1 iteration count using the environment variables to set those at run-time prior to NSS-init. Trial and error found the default of 10,000 too high so there is some inherent slowness accessing dbm compared to sql, and the slowness opening the password manager is directly related to the number of rounds used, however, the caching here doesn't seem to make any difference for opening the password manager (there's a considerable delay with a few hundred stored passwords that isn't influenced at all by this caching). That's why I asked, as theoretically the effect should be just as noticeable for dbm if there's no difference in how they key derivation is used.

Mark, In that case, yes, this code would speed up those accesses.

BTW, in case it wasn't clear in my comment 37, this cache is actually a reasonable way to solve this (not really a hack). It also provides benefits beyond the SDR issue, and would make it less painful to handle even larger iteration count values in the future.

Thanks. I've done some further testing and came up with peculiar results that possibly point to a different problem with storing rounds in the legacy db being at the root of my slowness here.

I tested this on NSS with these patches in.
Setting it to 500 rounds, opening the legacy password manager (which is the one present in our browser) with about 800 logins stored takes ~ 6 seconds on my hardware. Peculiarly, an unpatched NSS gives about the same access time which is what I got confused about. Of note it's likely a bit faster with the patch since my temp build was 32-bit and my release version I compared against is 64-bit.

Now, changing the rounds to 5000, the time increases to around 21 seconds without (!) re-keying the database. Since the rounds are stored with the records, shouldn't that remain the same? I think it might be using the rounds for a bogus calculation somewhere for each record, because I don't really see how else this would suddenly become significantly slower.
Then, re-keying the database to the new value by changing the MP, the access becomes predictably slower, increasing to almost a minute. If memory serves, unpatched NSS would actually choke harder on higher rounds like this to the point of breaking.

So, I think this is what's going on here:

  1. The low level caching does seem to make a difference because I don't think 5000 rounds would have been usable before on my size db
  2. dbm rounds storage, rekeying and using of the rounds seems to work as-intended, BUT
  3. For legacy db, there seems to be a superfluous hashing operation that uses the default rounds set for each access of a login (regardless of what is stored in the db for rounds); that's likely a bug/oversight in the dbm routines using more than one round, somewhere, that causes a big perf issue beyond what would potentially be influenced by db type.
Whiteboard: [qf] [passwords:master-password] → [qf:p3:responsiveness] [passwords:master-password]
Attachment #9120411 - Attachment description: Bug 1606992 - Follow-up to cleanup PBE cache code. r?mt → Bug 1606992 - Follow-up to cleanup PBE cache code. r?kjacobs

This fix is available in Nightly builds now and will be in 73.0b5 Beta builds shipping tomorrow as well. We should get some QA smoketesting around this to a) verify improved behavior and b) look for any regressions around saved passwords/master password/etc so we can more confidently take the fix in the upcoming 72.0.2 release as well.

Flags: qe-verify+
Flags: needinfo?(andrei.vaida)

Ronan, how is today's (2020-01-14 or newer) Nightly build working for you? It should have the NSS fix in place.

Flags: needinfo?(ronan)

FWIW the issue is fixed for me in today's Nightly.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #56)

Ronan, how is today's (2020-01-14 or newer) Nightly build working for you? It should have the NSS fix in place.

Fix confirmed for me too in 74.0a1-20200114214307 ! My about:logins with 300 passwords opens instantly (not measured, but I'd say <500ms) as opposed to 10s. Awesome work everybody, thanks for the fast fix 🙂.

Flags: needinfo?(ronan)

In regards to comment 55:

Can confirm with 73.0b4, that setting the master password does present really slow display time for the saved logins (@ 5 seconds for 20 logins on an AMD PC).
Leaving a summary bellow with the results of the testing session conducted in this area with the notes that the timers noted are with the master password enabled and set up:

Windows 10 :

  • PC - AMD processor ATI Radeon GPU
  • bad - 20 logins: 73.0b4 (5 sec) | good - 73.05b (instant)/74.0a1(2020-01-14)

Ubuntu 16 :

  • Laptop - Intel (integrated GPU)
  • bad - 130 logins: 73.04b/73.0b1/72.0.1 (8sec) | good - 73.05b/74.0a1(2020-01-14)

macOS 10.13:

  • bad - 72.0.1(7sec) | good with 73.0b5.

No other obvious issues were encountered.

Flags: needinfo?(andrei.vaida)
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED

The cleanup patch introduced a crash during FIPS tests.
When executing the modutil tool, functions nsc_CommonInitialize and sftk_PBELockInit are reached twice, and the stricter check PORT_Assert(!PBECache.lock); fails.

I suggest to change it back to the lenient check we use for the stable branch currently.

Bob, do you consider it possible that the first time NSS calls nsc_CommonInitialize it might involve a race? (If yes, we'll have to use something like PR_CallOnce for the lock init. Alternatively, are you able to suggest a better place for performing a one time init for the cache lock?)

Flags: needinfo?(rrelyea)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I did some additional testing.

In the modutil FIPS scenario, nsc_CommonInitialize is called again, because of the changed FIPS state. Despite nsc_CommonInitialize being called twice, we reach the shutdown case only once. For that scenario, with the lenient init check (no assert), we should be good. (Assuming we get confirmation from Bob that we don't have to worry about a startup race for first token init.)

softoken_gtest is a test that calls init and shutdown softoken repeatedly. Here we're reaching the lock init and shutdown functions in pairs, so with the attached patch, we should still be good.

I wonder if NSS might execute a sequence of calls like init, init, shutdown, shutdown. If not, we're good. If yes, but it's sequentially, we're still good. We'd only have to be worried if shutdown can be called twice in parallel on different threads. If we want to at least detect this scenario, we could add a PORT_Assert(PBECache.lock); call to the sftk_PBELockShutdown function. In a try build, that wasn't hit:
https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=b45840fb2e2a0d18f6b8d595bebbcc414083159b

Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → FIXED

re: Mark and dbm.

Ah, the issue is dbm does multiple encrypt/decrypts. That's because softoken expects private attributes to be encrypted individually, but dbm stores the keys as fully encrypted pkcs8 blobs, so when you access a private key component you need to decrypt the underlying database, then encrypt the component to pass to softoken, which then decrypts the component. When you set your iteration count high, the encrypt/decrypt will each regenerate the key. The cache code means that happens only once for a key with a common salt and iteration count.

Kai, there isn't a race for Common initialize, but you are correct, it is called more than once when flipping between non-fips and fips mode. In that case you could have both fips and non-fips slots open at the same time.

Flags: needinfo?(rrelyea)

Thanks for the explanation. Please bear with me:

  1. So IIUC, because of this extra encryption/decryption step it's always going to be slower and dbm isn't as vulnerable to brute-force guessing as the SQLite db back-end? Or would MP guessing bypass these extra steps since nothing is actually retrieved from the db?
    If the latter case it's likely even more important for dbm to have efficient caching so rounds can be set sufficiently high to have proper protection.
  2. Also, if keys are constantly regenerated, doesn't that effectively kill the makeshift cache that can only store one key at a time?

Is there a solution to this bottleneck? Can those extra steps be cached in the same/a similar way or not?

Reopening. I'm afraid this isn't sufficiently fixed yet. Testing with FF 72 and NSS 3.48.x (in bug 1608480) showed, that it's still slow with FF 72.0.2

Worse, it's still slow, if migrating from 72 to 73 or 74, without updating the MP.

Here's an initial theory:

  • Bob had recently changed SDR to use AES, which apparently uses KDF2.
  • Before that change, MP for SDR used KDF1
  • the fix only added caching for KDF2, not KDF1.

If the above is correct, we'd have the following scenarios:

  • people who last changed their MP with FF 71 or earlier will have iteration count "1", so no performance issue
  • people who change their MP with FF 72 still go through uncached KDF1, which will continue to be slow in later versions, until they change their MP with FF 73+ or later.

I'm afraid this means we'll have to add caching for KDF1, too.
(Previously, I had incorrectly assumed it's unnecessary.)

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I see Ronan had reported the issue with FF 73, which already had Bob's change, and therefore already went through KDF2.

Did we get reports from users of FF 72 about slowness?

Patch for KDF1 caching submitted for review.

NSS try build with that code:
https://treeherder.mozilla.org/#/jobs?repo=nss-try&revision=5e63fa59eb2ca5e563c92d0006648efd1786db63

Linux FF 72 try build with that code (including the cleanup that landed on trunk):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=547a91999ea774acb453ac805690ff8b58bba174
(Windows and Mac: https://treeherder.mozilla.org/#/jobs?repo=try&revision=71ab924c4eb1f51a13629847c7b6cad771b16ff8 )

Using that build, I was able to see the speedup.

Because FF 73 is still slow, if MP was set with FF 72, I've started a try build for FF 73, too:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f3c1e33cb340822e65f538efc223615d1dddefe7

FF 73 try build is fast, too.

Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → FIXED

Target milestone is currently set to 3.48, resetting to 3.50, because the issue wasn't really fixed.

Version: 3.48 → 3.50

In therms of manual verification, any additional checks required here?

Flags: needinfo?(kaie)

(In reply to Cristian Fogel, QA [:cfogel] from comment #77)

In therms of manual verification, any additional checks required here?

The important detail is the version of Firefox that was used to set/change the master password (MP).

If MP was set/changed with FF 71 or older, it's always fast.

If the MP was set/changed with 73 or 74, then the fix contained in 73b8 and later already makes it fast.
(Also, if you'd use a profile prepared with FF 73, then run 72.0.2 with --allow-downgrade, it's fast, too.)

The new, additional issue can be seen, if the MP was set with FF 72.
To test if the additional fix works, do the following:

Preparation:

  • Start with Firefox 72 (any 72.x), new profile, set a master password (for testing, one character is fine)
  • F12 on keyboard, click three dots, settings, bottom right, Enable browser chrome and add-on debugging
  • ctrl-shift-j, execute command from comment 15, then command from comment 14
  • (optional: make a copy of the following files, so you can more easily reproduce later, without repeating the above procedure: key4.db, cert9.db, pkcs11.txt, logins.json - if desired at a later time, instead of repeating these preparation steps, you could create a fresh profile, and copy those 4 files into a new created profile folder)

Testing

  • start Firefox (72 or 73 or 74) with the above profile
  • open about:logins
  • count the seconds until the left hand side shows a list of 1000 entries
  • if it takes longer than 2 seconds on a fast machine, bug isn't fixed
  • if it's faster, it's fixed
  • for sanity, check that the decryption works correctly. Skim the list, all entries should look like userX where X is a number between 0 and 999. Pick any entry from the list, and click the icon to show the password. The password should be like passwordX, with the matching number.

As of now, with bug 1606927, only a fix for 74 nightly has landed.

Flags: needinfo?(kaie)

As a simplification, after having prepared with FF 72, it's fine to reuse the same profile with --allow-downgrade when switching between versions 72, 73, 74. (Don't change the MP with 73/74, only with 72.)

I suggest that we revert the tracking flags back to "affected" for 72/73.
74 should be set to "fixed".

NSS 3.49.2 has landed for 73.0b9, which should be available for testing tomorrow.

We initially called this out as fixed in the 72.0.2 release notes, I have now removed that item.

@Kai, thanks alot for the detailed info provided on comment 78.

  • Managed to do another check for the new scenario, but I didn't see a major improvement.
    Checking with other versions(71), there wasn't a real difference between the load timers for affected(72) or fixed(73,74) builds for 1000 login entries saved.
    Created entries from 71 had ~the same load timers as ones created on 72 when loading them in 73/74.
    With my workstation it went up to 6 seconds, but that would be more or less expected.

  • The average load time was around 3 seconds on ok-ish machines.

  • As for the sanity check, the data appeared to be in order when checking for random entries since the snippet I had working for beta/RC was:

const { LoginHelper } = ChromeUtils.import(
  "resource://gre/modules/LoginHelper.jsm"
);
var logins = [];
for (let i = 0; i < 1000; i++) {
  logins.push({origin: 'https://${i}.example.com', username: 'user${i}', password: 'password${i}'});
}
await LoginHelper.maybeImportLogins(logins);

Testing session was contucted for 73.0b10, 74.0a1(2020-01-26).

Christian, it doesn't matter which version was used to create the 1000 entries.

What matters is the version that is used to set/change the MP.

If you set the MP with FF 72, and then use unfixed 73, then you should see slowness.
If yet set the MP with FF 72, and then use the fixed 73.0b9, it should be instant.

Got it, did a quick check for 73.0b9 as well but it is not instant for the 1000.
Just did a crosscheck with removing the master password and the timers are the same(no additional slowness).

It still takes the 2-3 seconds on the Ok machine(s), timer can assume would decrease on better setups.
While it's still not instant, we can mark this bug as verified if this is acceptable for this scenario.

After Kai's suggestion; checking with 73.0b8 there was a (+5second) increase in the load time as opposed to the versions containing the fix 73.0b10, 74.0a1(2020-01-26).
Updating the flags to verified.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

[deleted]

The code examples gave in the comments don't display correctly.
The bugzilla display formatting replaces the "smaller than" character with the html escape sequence, even when wrapped with triple backticks.

I did some additional verification on a slower machine, a netbook from 2010, Intel Atom N270 processor, 3 GB memory, Windows 7, 32bit OS.

Test profile with 200 logins.

With Firefox 72, without MP, it takes 4-7 seconds to load about:logins.

Then set the MP on Firefox 72. That takes almost 29 seconds to load (after entering the MP).
Same slowness with 73.0b8.

With 73.0b10, after entering MP, it's down to 8 seconds.

(In reply to Kai Engert (:KaiE:) from comment #76)

Target milestone is currently set to 3.48, resetting to 3.50, because the issue wasn't really fixed.

I think you got the fields mixed up. You set the Version field not the Milestone.

(In reply to Mathew Hodson from comment #91)

You set the Version field not the Milestone.

Thanks. Corrected.

Target Milestone: 3.48 → 3.50
Version: 3.50 → 3.48
You need to log in before you can comment on or make changes to this bug.