Closed Bug 1742578 Opened 3 years ago Closed 3 years ago

Loading a non-encrypted, text-only message with autocrypt header takes several seconds, which also blocks message list navigation

Categories

(MailNews Core :: Security: OpenPGP, defect, P2)

Thunderbird 91

Tracking

(thunderbird_esr91+ fixed)

RESOLVED FIXED
96 Branch
Tracking Status
thunderbird_esr91 + fixed

People

(Reporter: thomas8, Assigned: KaiE, NeedInfo)

References

Details

(Keywords: perf, testcase, Whiteboard: [enterprise-relevance])

Attachments

(7 files)

Originally reported on the tb-enterprise mailing list (2021-11-17).

Seen on TB 91.3.1 (64-bit), also TB 78.14.0 (32-bit), Win10. Something wrong with how we handle an autocrypt header in a simple, non-encrypted message, which takes way too long. Have some more such messages, and TB becomes virtually unusable. Should fix this asap.

STR

  1. Thunderbird gmail IMAP account with encryption disabled.
  2. In 3-pane message list, select a non-encrypted text-only message which has an autocrypt header in the source.
  3. On an average machine, stop the time it takes to load that msg.
  4. Use repeated arrow-up/down keyboard navigation with a minimal delay on the message, and try to move on to other messages with some more keypresses while observing selection.

Actual

  • Loading simple, non-encrypted msg with autocrypt header takes a whopping 4 seconds before showing anything on TB 91.3.1 (64-bit) (small profile), and 5-6 secs on TB 78.14.0 (32-bit) (large profile). That's with 16 GB of RAM and Intel i3 processor.
  • If message list keyboard navigation happens to trigger loading the message, further navigation is blocked (worse: buffered) for 4 seconds until the message is fully loaded, selection stuck. After that, selection will jump as it clears off the keyboard buffer. Odd.
  • Removing the autocrypt header from msg source eliminates the delay.

Expected

  • With non-encrypted message and with encryption disabled in TB, do we still need to parse the autocrypt header? Probably not? The only reason to parse it might be to show its existence, for which just checking for the existence of autocrypt header should be enough, which can't possibly take 4 seconds.
  • For scenarios where parsing the autocrypt header might be needed: Parsing speed should be improved, and explore if such parsing can be done after showing the message content (for non-encrypted messages).
Summary: Loading a text-only message with autocrypt header takes several seconds, which also blocks message list navigation → Loading a non-encrypted, text-only message with autocrypt header takes several seconds, which also blocks message list navigation
Whiteboard: [enterprise-relevance]

Could you please specify which algorithm and size is used for the key and subkey, stored in the header? This may be related to attempt to load/import key, which includes key material validation by RNP.

The performance penalty and resulting UX impact here is quite high - as reported on the thread, for users receiving many messages with autocrypt header, this may slow down message list handling to an extent of being unfeasible, i.e. Thunderbird virtually unusable.

Severity: -- → S2
Priority: -- → P2

Testcase 1: Email message with autocrypt header

Testcase 2 (eml): Message from Testcase 1 with autocrypt header removed

Attachment #9252099 - Attachment description: Testcase 1: Email message with autocrypt header → Testcase 1 (eml): Email message with autocrypt header

(In reply to Nickolay Olshevsky from comment #1)

Could you please specify which algorithm and size is used for the key and subkey, stored in the header? This may be related to attempt to load/import key, which includes key material validation by RNP.

Well, the point is, with encryption disabled in a given Thunderbird installation (and the message not encrypted), what is the point of key material validation by RNP? And even if there was a point, that validation must not block the entire UI...

I've included Ronny's message to the list as a testcase, with and without the autocrypt header.
With regards to your question, maybe Ronny knows more?

Flags: needinfo?(ronny.adsetts)
Keywords: testcase

Thanks for the test case, looks like my assumption was correct - via cmdline it takes plenty of time, and most of it is spent in 4096-bit ElGamal subkey validation. On RNP side we validate keys only in key import call, and it seems TB uses this call to load a key.
While I cannot speak for TB, on RNP side we had some thoughts on changing this behaviour by moving validation to first key's usage, and/or adding flag to disable key validation on import.

Kai, is my assumption correct and TB attempts to import key once autocrypt header is spotted?

Flags: needinfo?(kaie)

(In reply to Thomas D. (:thomas8) from comment #5)

(In reply to Nickolay Olshevsky from comment #1)

Could you please specify which algorithm and size is used for the key and subkey, stored in the header? This may be related to attempt to load/import key, which includes key material validation by RNP.

Well, the point is, with encryption disabled in a given Thunderbird installation (and the message not encrypted), what is the point of key material validation by RNP? And even if there was a point, that validation must not block the entire UI...

I've included Ronny's message to the list as a testcase, with and without the autocrypt header.
With regards to your question, maybe Ronny knows more?

My PGP is an ancient 1024 bit DSA signing key. Key ID is FE3E2E311500949F. I've not tested with other signing keys as this is the one used for signing emails I've had complaints about from my clients/colleagues.

Thanks for the input, Ronny. The issue not in a signing key part, but in encryption subkey - 4096-bit ElGamal check is quite time-expensive at the moment.

(In reply to Nickolay Olshevsky from comment #6)

Kai, is my assumption correct and TB attempts to import key once autocrypt header is spotted?

Yes. TB imports the key into a temporary space, in order to get meta information about the key.

Flags: needinfo?(kaie)

Regardless of any other optimizations we could do, the easiest might be to introduce a cache, that remembers the (e.g.) 10 last keys we have seen in an email, and its related meta information. Then we could skip the processing when switching between the emails of those people without further delays. (Only the first email clicked from each person would cause a delay.)

Assignee: nobody → kaie
Status: NEW → ASSIGNED

A while ago[1] we tried the DLL from Octopus/Sequoia, see:
https://sequoia-pgp.org/blog/2021/04/08/202103-a-new-backend-for-thunderbird/
It didn't show the delays in displaying messages. So it appears that this is caused by the backend. Would it be worth investigating what makes Sequoia's approach faster? A cache will still be slow on every new message that is displayed. And it will be slow again in the next session after a restart of Thunderbird.

[1] Sadly those DLLs don't work anymore after bug 1732909. Octopus implemented that
https://gitlab.com/sequoia-pgp/sequoia-octopus-librnp/-/issues/44
but to our knowledge hasn't shipped new binaries.

The difference is that GnuPG apparently doesn't perform a key verification at import time. RNP does.

(In reply to Barbara from comment #13)

A cache will still be slow on every new message that is displayed.

Not on every message that is displayed.

Only for messages that contain different keys. If you switch between several messages of the same sender, that contain the same key, the cache will avoid the slowness.

Also, it seems most keys aren't slow to process. It seems the key that is used in the example here is of a type that is particularly slow.

I'm comparing it with my own RSA 4096 bit key, and tested using the RNP command line utility. Importing my key takes 10 milliseconds, the key used in the attached example takes 400 milliseconds on my machine.

And it will be slow again in the next session after a restart of Thunderbird.

Yes, once per session every slow key will be noticed.

(In reply to Barbara from comment #13)

A while ago[1] we tried the DLL from Octopus/Sequoia, see:
https://sequoia-pgp.org/blog/2021/04/08/202103-a-new-backend-for-thunderbird/
It didn't show the delays in displaying messages. So it appears that this is caused by the backend.

Sequoia doesn't support ElGamal, so even if it would load that key, it will just parse data without ability to encrypt/decrypt to that key. Also, as far as I know, it doesn't implement key material validation.

And, as Kai mentioned, this is only due to one slow check for ElGamal keys, which we are considering to remove. Additionally we consider to change the approach to key validation.

(In reply to Nickolay Olshevsky from comment #16)

(In reply to Barbara from comment #13)
Also, as far as I know, it doesn't implement key material validation.

What do you mean by "key material validation"?

(In reply to neal from comment #17)

What do you mean by "key material validation"?

Misc math checks of public/secret key MPIs, like whether group generator's order is correct, primality tests, and so on. In this exact case most time is spent to check whether ElGamal's P is prime.

Thanks for the clarification.

We've experienced the slowness when corresponding with a communication partner with the enclosed key. Somehow TB attaches an old revoked key as well which indeed appears to be ElGamal. Is there any reason to send out the revoked key and why is it checked if it's already revoked?

Magnus, in the review for the cache patch, you suggested an alternative/additional optimization: Change getKeyListFromKeyBlock to be async. I don't immediately see how this could be an improvement. Most callers are sync functions and need the results to continue their processing.

In theory, we could change the message UI to use an async call, and allow the UI to defer updating until the information is ready. But this has the risk that the information will come in too late, e.g. the user could have clicked the details already, and we're showing there's no key. Or the user might have already switched to another message, then we must ensure we don't update the screen with the information arriving late.

Unless I'm missing something, I don't see a benefit in converting the function to async, and then using await to call it.

(In reply to Barbara from comment #20)

We've experienced the slowness when corresponding with a communication partner with the enclosed key. Somehow TB attaches an old revoked key as well which indeed appears to be ElGamal. Is there any reason to send out the revoked key and why is it checked if it's already revoked?

Thunderbird doesn't send out the revoked key. It sends the new key, plus the revocation statement for the old key. The intention is to distribute the information that the old key was revoked, and allows Thunderbird to automatically import revocations, and thereby to automatically stop using revoked keys.

(In reply to Kai Engert (:KaiE:) from comment #22)

Thunderbird doesn't send out the revoked key. It sends the new key, plus the revocation statement for the old key.

Wait, apparently it does send out the old revoked key, too? I'm not sure why it does. I need to check.

So independently of how exactly the revocation information is attached - we're attempting to get a listing of all attached keys. In order to get a listing, we import the key into a temporary space. And that importing causes the checking on the RNP library side. I'd like to file a separate bug to check if the attaching of revocation information could be optimized (I was convinced that we'd not attach the full key, but apparently I'm wrong).

The cache optimization will help with this attachment, too.

Let's land the cache, and keep investigating for additional improvements.

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/c3d468fc8947
Avoid repeatedly obtaining OpenPGP key listings using a LRU cache. r=mkmelin

(In reply to Kai Engert (:KaiE:) from comment #21)

In theory, we could change the message UI to use an async call, and allow the UI to defer updating until the information is ready. But this has the risk that the information will come in too late, e.g. the user could have clicked the details already,

Well, this would look like a design decision which could be worked around: whether/when you allow clicking on security details, or whether in the details you show a spinner until things are ready, or some such.

and we're showing there's no key. Or the user might have already switched to another message, then we must ensure we don't update the screen with the information arriving late.

Again, that should be possible to handle. Changing to a new message should probably automatically end all functions still working on the previous message.

Unless I'm missing something, I don't see a benefit in converting the function to async, and then using await to call it.

Note that this bug is reported for the scenario of encryption disabled in Thunderbird, where it would appear even more suprising why Thunderbird needs to slow me down for parsing keys before showing me the actual message.

Comment on attachment 9252558 [details]
Bug 1742578 - Avoid repeatedly obtaining OpenPGP key listings using a LRU cache. r=mkmelin

[Approval Request Comment]
Regression caused by (bug #): openpgp in general
User impact if declined: slowness
Testing completed (on c-c, etc.):
Risk to taking this patch (and alternatives if risky): low

Attachment #9252558 - Flags: approval-comm-beta?

(In reply to Thomas D. (:thomas8) from comment #27)

Again, that should be possible to handle. Changing to a new message should probably automatically end all functions still working on the previous message.

It isn't possible to cancel that slow operation.

Note that this bug is reported for the scenario of encryption disabled in Thunderbird, where it would appear even more suprising why Thunderbird needs to slow me down for parsing keys before showing me the actual message.

Magnus' patch (2nd attachment) has the effect to show the message immediately, and only afterwards causes the delay (which the user might not notice while staring at the message).

We could potentially decide to not automatic process attached keys (to check for revocated key we want to import etc.) if the message isn't sent to any of the user's accounts that have OpenPGP enabled. The 3rd attachment implements that. It has the small risk that we ignore such information if the user was added with BCC.

JFYI: we are decided to relax key material checks on public key import, so this special delay on ElGamal keys will go away. Fix will be included in RNP v0.16.0.

Pushed by mkmelin@iki.fi: https://hg.mozilla.org/comm-central/rev/4a4bc0a21f8b make getKeyListFromKeyBlock (and call chains) async. r=kaie
Attachment #9253175 - Attachment description: Bug 1742578 - Don't automatically process attached OpenPGP unless we find an own recipient account with enabled OpenPGP. r=mkmelin → Bug 1742578 - Don't automatically process attached OpenPGP keys unless OpenPGP is enabled for at least ony identity. r=mkmelin

third patch ready for checkin.

(In reply to Nickolay Olshevsky from comment #33)

JFYI: we are decided to relax key material checks on public key import, so this special delay on ElGamal keys will go away. Fix will be included in RNP v0.16.0.

Thanks Nickolay. Once we have that update, we can check if we want to undo the third patch.

(In reply to Kai Engert (:KaiE:) from comment #36)

(In reply to Nickolay Olshevsky from comment #33)

JFYI: we are decided to relax key material checks on public key import, so this special delay on ElGamal keys will go away. Fix will be included in RNP v0.16.0.

Thanks Nickolay. Once we have that update, we can check if we want to undo the third patch.

It would appear to me that keeping the function async is better even if it works faster.
If the third patch doesn't handle BCC case well, then we should try to improve that.

Target Milestone: --- → 96 Branch

Pushed by thunderbird@calypsoblue.org:
https://hg.mozilla.org/comm-central/rev/a7be5786faf7
Don't automatically process attached OpenPGP keys unless OpenPGP is enabled for at least ony identity. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

(In reply to Thomas D. (:thomas8) from comment #37)

It would appear to me that keeping the function async is better even if it works faster.

It isn't faster, because we will still wait for the operation to complete, before we proceed with anything else.
However, Magnus async change causes the message to be displayed immediately, and the delay will happen after the message is already showing (but you will still have to wait to scroll or operate with the message).

If the third patch doesn't handle BCC case well, then we should try to improve that.

The strategy of the third patch has changed. It no longer has the BCC etc. restrictions I had described earlier.
The new strategy is simpler. If OpenPGP is disabled for all accounts, never do automatic key processing (no delay). If OpenPGP is enabled for at least one identity in any account, then always process the keys.

Comment on attachment 9253052 [details]
Bug 1742578 - make getKeyListFromKeyBlock (and call chains) async. r=kaie

[Approval Request Comment]
Regression caused by (bug #): no
User impact if declined: some messages will not display immediately
Testing completed (on c-c, etc.):
Risk to taking this patch (and alternatives if risky): low

Attachment #9253052 - Flags: approval-comm-beta?

Comment on attachment 9253175 [details]
Bug 1742578 - Don't automatically process attached OpenPGP keys unless OpenPGP is enabled for at least ony identity. r=mkmelin

[Approval Request Comment]
Regression caused by (bug #): no
User impact if declined: unnecessary operation delays for users who don't use OpenPGP
Testing completed (on c-c, etc.):
Risk to taking this patch (and alternatives if risky): low

Attachment #9253175 - Flags: approval-comm-beta?

you're right, thanks for the reminder!

Attachment #9252558 - Flags: approval-comm-beta?
Attachment #9253052 - Flags: approval-comm-beta?
Attachment #9253175 - Flags: approval-comm-beta?

Magnus, we need to decide about uplifting to stable 91.x

Uplifting the cache should be safe.

Uplifting the third patch, to not process keys if openpgp is completely disabled, seems like a reasonable change, and should also be safe to uplift.

I'd skip patch 2, maybe it's better to avoid large changes like this on the stable branch.

Flags: needinfo?(mkmelin+mozilla)

I'd be in favor of that

Sounds good.

Flags: needinfo?(mkmelin+mozilla)
Attachment #9252558 - Flags: approval-comm-esr91?

Comment on attachment 9253175 [details]
Bug 1742578 - Don't automatically process attached OpenPGP keys unless OpenPGP is enabled for at least ony identity. r=mkmelin

approval comments are available further above

Attachment #9253175 - Flags: approval-comm-esr91?

Comment on attachment 9253175 [details]
Bug 1742578 - Don't automatically process attached OpenPGP keys unless OpenPGP is enabled for at least ony identity. r=mkmelin

[Triage Comment]
Approved for esr91

Attachment #9253175 - Flags: approval-comm-esr91? → approval-comm-esr91+

Comment on attachment 9252558 [details]
Bug 1742578 - Avoid repeatedly obtaining OpenPGP key listings using a LRU cache. r=mkmelin

[Triage Comment]
Approved for esr91

Attachment #9252558 - Flags: approval-comm-esr91? → approval-comm-esr91+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: