Closed Bug 1667564 Opened 4 years ago Closed 2 years ago

should automatically collect incoming public keys from emails, for later usage

Categories

(MailNews Core :: Security: OpenPGP, enhancement, P1)

enhancement

Tracking

(thunderbird_esr91 wontfix)

RESOLVED FIXED
100 Branch
Tracking Status
thunderbird_esr91 --- wontfix

People

(Reporter: mkmelin, Assigned: mkmelin)

References

Details

Attachments

(3 files, 4 obsolete files)

Currently to get someone's key you have to do a manual import step. After this the key is "available" but untrusted.

I think we should automate this key collection so that TOFU encryption becomes real.

IIRC there are some technical aspects that prevents this atm; if the key storage becomes large it becomes slow. We don't necessarily need to store incoming keys in this list though. We should store at least

  • email address
  • key
  • url (mid:, or https:) of where it was obtained (for investigative purposes)
  • timestamp
Priority: -- → P2
Priority: P2 → P1

There is no inherent reason why the key store has to be slow. The current format, however, does have serious scalability problems, because it doesn't allow random access. It would be better to use a proper database. Sequoia's native public store (which is not used by the Octopus) does that and we have no problem doing random access on 100ks of certificates. I'd strongly suggest you go this route instead of introducing another cache.

Agreed, they should just be stored in a database.

This stores keys from email attachments, Autocrypt headers and keyserver lookups into an IndexedDB database, along with some metadata about when/from where it was collected.

Assignee: nobody → mkmelin+mozilla
Status: NEW → ASSIGNED

Great that there is a progress on a P1 bug, Thanks.
Forgive me to ask, but I could not see test coverage for Autocrypt key collection in the bug (like the test for an attached key)? Since the collection as I understand is a new feature I wonder if that is tested somewhere else when not in this commit?

Correct, it's not there yet. Opening .eml from file - which the test does - is a bit special so the header is not directly available in that case. So punted on that for now ;) It does work in normal operations though.

See Also: → 1749664

We need to define the desired behavior for the keys cache.

Index keys by all contained email addresses.

When storing a candidate key, we should make sure that we can lookup the candidate key quickly.
This requires an index.
If a key contains multiple email addresses, we must create multiple index entries, one for each email address, with each index entry pointing to the key.

The user may receive multiple keys for an email address.
We need to decide how to handle that.

Alice may have received an email from Bob one week ago, which still contained an old key.
Today, Alice received an email with Bob's newer key.

Alice clicks Bob's old email.
Alice replies to that old email, and decides to encrypt for the first time.

In this scenario, I request that we MUST NOT offer ONLY the old key.
In other words, a trivial implementation that always has just "one most recently seen key" per email address would fail to suggest Bob's newer key to Alice.

The implementation must use one of the following approaches:

(a)
Store multiple candidate keys (that have different fingerprints) per email address.
In the above scenario, Thunderbird would offer both candidate keys to Alice, old and new.
Is indexeddb capable of this data model?
From https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API/Basic_Terminology
"Each record in an index can point to only one record in its referenced object store".
If indexeddb doesn't support one-index-entry-to-multiple-records, then we cannot use it for this implementation approach.

(b)
Only store one candidate key.
This is tricky and risky.
First, we'd have to come up with our own model for identifying the "best" candidate key, that is the only one we want to store in the cache.
Using "most recently seen" isn't a good approach, as outlined in the above example, as it would cause to offer ONLY an older key to Alice.
An approach could also look at the "created" attribute. But that alone might not be reliable.
For example, Bob could have produced one key valid from 2015 to 2025, and another key valid from 2020 to 2021. If we're in 2022, this simple strategy would incorrectly prefer the newer key that has already expired.
Also, with a "one key only" strategy, you'd have to make sure that you don't store a key that has been revoked (overriding a non-revoked key).
I think this illustrates that a "one key per email address" would require a smart implementation.
Furthermore, storing just one candidate key makes it easier for an attacker to trick someone.

David regularly sends email with his key.
Carol has repeatedly received David's key, but never had a need to encrypt.
Now an attacker, Eve, sends an email to Carol with David as a fake sender address, and attached a fake key for David (private key owned by Eve, user ID claims it's David's key). The email requests Carol to reply, including sensitive information, and insisting that Carol encrypts the email.

With a simple implementation that collects only one key, Thunderbird might offer only the fake key.

I think this example is a strong argument for an implementation that collects and offers multiple candidate keys.
If there are multiple keys stored, Carol can notice that there are two keys for David, and she can notice that she needs to investigate which key is the correct one.

In addition to the above, the storage should avoid storing multiple entries with the same fingerprint.
An old email might contain Bob's key that expires mid 2022, and a newer email could contain the same key that has been extended to be valid until 2025.

Reading an older email caused storage of the expires-2022 key.
Reading a newer email with the expires-2025 key at a later time should detect that the new key is an update for an existing cached key. We shouldn't store two entries.

Later, after we have already stored the expires-2025 key, and Alice clicks an older email, we should not replace it with the expires-2022 key - rather we should keep the expires-2025 key version in the cache.

Then there is the question, is the "expires" a reliable attribute for deciding which version of a key (by fingerprint) should be preferred? Is it possible that a user could decide to shorten the validity of a key? I think the answer is yes. We might have to look at a different attribute, we might have to check which version was most recently modified, based on the most recent attribute that carries a self signature (by signature timestamp), and prefer that one.

Oh, and when updating a key in the cache, the list of user IDs might have changed. Which means, we'll have to update the email-address-to-key index entries.

I think we shouldn't use a simplification like "always bind a received key to the sender email address".

Alice might send email to Bob and Carol.
Alice knows that Bob hasn't talked to Carol previously, but Bob might need to reply, and Alice attaches Carol's key along with her message.

When Bob reads the email, it should be possible to cache Carol's attached key, despite the email having been sent by Alice.

I think the above explanations show the complexity related to storing OpenPGP keys, and making sure you keep only the "best" and "most up to date" key, which might not be the version that you have seen most recently (e.g. when clicking an old email).

(Relying on an email timestamp isn't a great idea either, because someone else might forward you an old version of someone's key in a current email.)

I think that it would be best if Thunderbird avoided re-inventing and coding all the logic that is required to keep the best version of a key. Obviously an OpenPGP library already has the logic for that.

But as mentioned before, the simple key file format used by RNP doesn't offer random access, and sequentially looking through a large list of randomly collected keys would be slow. Or maybe it won't? [edited: "slow"]

Ideally, I'd prefer to use a standard key ring for storing the cached keys.

Here is an idea that we could use to combine the best of both worlds.

We could have one keyring cache per fingerprint.
Each keyring cache contains exactly one key for a single fingerprint.

We could have one database that contains all the keyring caches.
Each entry in the cache is the contents of one keyring for one fingerprint.

We have an index by fingerprint, that points to the related in the database.

We can multiple index entries, for each key email address, that points to the same entry.

The following strategy would be used when updating the cache:

  • get fingerprint of newCandidateKey
  • lookup database entry for the fingerprint
  • if found, read the existing entry, and construct a temporary RNP key store (in memory)
  • use an RNP call to import the newCandidateKey into that temporary key store
  • this makes full use of all of RNP's existing logic to merge keys
  • obtain the storage bytes for that key store, and update the cache database entry

If there is no cache entry for the fingerprint yet, we still create a temporary key store for the fingerprint, import the newCandidateKey into the empty key store, and use the same approach for storing it to disk.

Some (possibly useful) input on this topic:

  • fingerprint is unique identifier for the key material, so, even when key's expiration time is extended, userid added/changed or new subkey is added fingerprint of the primary key remains the same.
  • inside of RNP all keys are stored in map <fingerprint, key object>, so fingerprint key lookup is quite fast. Userid/Keyid lookup iterates over the keyring.
  • when updated primary key is imported to the keyring it doesn't mean 'new key is added' but it means 'all of it's userids, signatures, subkeys are appended to the existing key'.

So, in terms of indexeddb, possible solution could be to have userid (email) as index, and string with list of fingerprints of keys, where this userid was noticed, as value. Not good db design, I know, but should work for few keys per userid. When TB needs key for some email address, then it gets list of fingerprints, and then request corresponding keys from RNP. Yeach key then must be checked for validity, for the validity of the userid, and so on.

(In reply to Nickolay Olshevsky from comment #11)

Some (possibly useful) input on this topic:

  • fingerprint is unique identifier for the key material, so, even when key's expiration time is extended, userid added/changed or new subkey is added fingerprint of the primary key remains the same.

Thanks for confirming, yes that's an important aspect of the design.

  • inside of RNP all keys are stored in map <fingerprint, key object>, so fingerprint key lookup is quite fast.

If we used a single RNP keyring for all candidate keys, we'd have to save that big keyring to disk every time we update it. That's slow. Also, we'd have to read the full keyring into memory the first time we need it. That might also be slow for a big keyring. That processing can be avoided with the suggested approach to have a database containing many small keyrings (one fingerprint per keyring).

To clarify, the idea is to keep the regular keys separate from the candidate keys. For all regular keys, we use a regular RNP key ring. Only for the candidate keys we'd use this suggested database approach.

Userid/Keyid lookup iterates over the keyring.

Yes, and because we try to find candidate keys by both fingerprint (updating) and email address (searching), we need index for both.

  • when updated primary key is imported to the keyring it doesn't mean 'new key is added' but it means 'all of it's userids, signatures, subkeys are appended to the existing key'.

Right. That's also a reason for the suggested design. If we already have a candidate key stored, and we see a key with the same fingerprint, the intention is to ask RNP to merge it.

So, in terms of indexeddb, possible solution could be to have userid (email) as index, and string with list of fingerprints of keys, where this userid was noticed, as value. Not good db design, I know, but should work for few keys per userid. When TB needs key for some email address, then it gets list of fingerprints, and then request corresponding keys from RNP.

We should obtain the candidate keys from disk storage, without having all candidate keys in memory. So we cannot request the key by fingerprint from RNP.

(In reply to Nickolay Olshevsky from comment #11)

So, in terms of indexeddb, possible solution could be to have userid (email) as index, and string with list of fingerprints of keys, where this userid was noticed, as value. Not good db design, I know, but should work for few keys per userid. When TB needs key for some email address, then it gets list of fingerprints,

Thanks, yes this could be a workaround, if we're required to use indexedDB.

But this is an example why I'd prefer to use sqlite.

(In reply to Kai Engert (:KaiE:) from comment #7)

(a)
Store multiple candidate keys (that have different fingerprints) per email address.
In the above scenario, Thunderbird would offer both candidate keys to Alice, old and new.
Is indexeddb capable of this data model?
From https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API/Basic_Terminology
"Each record in an index can point to only one record in its referenced object store".

I might have misunderstood that. Apparently an index can have multiple records with the same value, and each one can point to a difference record in the object store, I guess.

Magnus uses an index that gives me the impression that it's actually possible:

        objectStore.createIndex("emails", "emails", {
          unique: false,
          multiEntry: true,
        })

My impression is that indexedDB isn't intended to be used for this kind of data management. It's a standard for web sites to store data in the browser, without giving it more control.

Looking at the implementation of indexeddb, the files that are stored in the profile folder are actually sqlite files!

I think we should use sqlite directly, and not force us to go through an artificially limited API that was designed for a different purpose.

We should use web technology where we can, and I do think IndexedDB is actually a very suitable way to store keys in this scenario.

I think a lot of the comments above were misunderstanding of what the patch did. It was storing keys with a primary key of fingerprint+type (type being keyserver, attachment etc.). We'll index all the emails so that we can find which candidate fingerprints there are for an email address.

You do make some good points on what we should be storing, and that we should just be storing one key per fingerprint even if they have different sources - since same fingerprint will mean same key (essentially). I'm updating the code, more or less to comment 10 so that we will store keys by fingerprint only, and if/when we get an existing key, give new+old to RNP, let it sort out the outcome, and store the outcome of those - so that we don't have to worry about expiry or other changes.

(In reply to Magnus Melin [:mkmelin] from comment #16)

I think a lot of the comments above were misunderstanding of what the patch did. It was storing keys with a primary key of fingerprint+type (type being keyserver, attachment etc.). We'll index all the emails so that we can find which candidate fingerprints there are for an email address.

Note that many of my comments weren't directly about your patch.

I was doing general brainstorming, and made suggestments for the implementation.

Attached patch fix-auto-update.patch (obsolete) — Splinter Review
Comment on attachment 9260936 [details] [diff] [review]
fix-auto-update.patch

Review of attachment 9260936 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! Will add it to D135074

::: mail/extensions/openpgp/content/ui/enigmailMessengerOverlay.js
@@ +3314,5 @@
> +      // Charlie's revoked or extended key attached. It's useful for
> +      // me to learn that.
> +
> +      if (oldKey.keyCreated != newKey.keyCreated) {
> +        // This should never happen, let's skip.

Since it should be technically impossible, I don't think we need this check.
Attachment #9260936 - Attachment is obsolete: true
Blocks: 1752428
Attached patch ontop-colldb.patch (obsolete) — Splinter Review

Magnus, while testing, while trying to understand an unexpected behavior, I had suspected a transaction visibility issue, and I had applied the ontop-colldb.patch to your code. I guess it cannot hurt to have these explicit commit statements as soon as we're done. If you agree, could you please apply this patch to phab? Thanks

Flags: needinfo?(mkmelin+mozilla)
Comment on attachment 9261424 [details] [diff] [review]
ontop-colldb.patch

Review of attachment 9261424 [details] [diff] [review]:
-----------------------------------------------------------------

According to https://developer.mozilla.org/en-US/docs/Web/API/IDBTransaction/commit it should not be needed to do this.
Especially for the clear(), maybe best not to. Though should not make a difference.

Comment on attachment 9261424 [details] [diff] [review]
ontop-colldb.patch

ok, I agree it's probably fine, because the objects go out of scope quickly.

But not calling commit explicitly, it means we rely on automatic immediate cleanup. If there isn't immediate cleanup, the visibility of new items to other code may be delayed.

Attachment #9261424 - Attachment is obsolete: true
See Also: → 1757385

Including other people's keys in an email is reasonable if they are part of the conversation, and the sender wants to ensure everyone on the CC list has everyone's keys, to reply to an encrypted message.

On the other hand, I think it's reasonable to limit Mallory's ability to poison our lookup cache.

Mallory could send email to Alice, which includes Mallory's own key, and a fake key for Bob.
However, If Bob isn't included in the email, then Bob has no ability to notice what Mallory is doing.

Suggestion:

  • ignore all attached keys for email addresses that aren't
    included in the TO/CC list.
    (This filtering should be implemented as part of this bug.)
    The filtering will restrict automatic key collection.
    The user will still be able to manually click the attachment and import,
    but then the user has a chance to notice what the email contains.
  • by limiting the automatic collection to email addresses that are on TO/CC,
    Mallory's attempt for easy poisoning of Alice's cache requires that Bob is
    a recipient of the email. If Bob sees the email with a false key,
    then Thunderbird should warn Bob about it. (bug 1757385)

I think it is important that we don't allow unlimited key collection of any arriving key data. The strategy described in comment 24 might be a sufficient way for limiting what we import.

Ideally, we should clean up entries that are no longer required.

We could remove from CollectedKeysDB, whenever a key is permanently imported, or if we learn that a collected key is now in revoked or expired state.

But regardless of attempts cleanup, we still have the risk that we could have redundant information.
Any code that considers the use of CollectedKeysDB as a source for keys, must check for duplicate information, and if a permanently imported key is already available, the information in CollectedKeysDB should be skipped (e.g. don't offer the same key twice to the user). (Such a check is already used in the candidate patch for bug 1627956).

Because we need to have code that checks for duplicates, prior to using information from CollectedKeysDB, I think it's fine to postpone automatic cleanup of CollectedKeysDB.

And we should do some sanity filtering.

If we have a secret key for an email address, we shouldn't collect an advertised public key.

If incoming data contains more than one (non-expired non-revoked) key for an email address, that isn't helpful but misleading, and I suggest to skip all of those duplicates.

Flags: needinfo?(mkmelin+mozilla)
Attached patch 1667564-armor.patch (obsolete) — Splinter Review

Magnus, you need this patch on top of your phab revision. I'll explain in phab.

Attached patch 1667564-armor-v2.patch (obsolete) — Splinter Review

fixed (v1 had unwanted test code)

Attachment #9265730 - Attachment is obsolete: true

Comment on attachment 9265732 [details] [diff] [review]
1667564-armor-v2.patch

patch was merged into phab, marking obsolete

Attachment #9265732 - Attachment is obsolete: true

Attachment 9266663 [details] implements my suggestions from comment 24 and comment 26.

I have one additional concern regarding efficiency.

Every time the user clicks an email, we will update the CollectedKeysDB again.
(And I realize the same applies also for the attempt to automatically update known keys.)

Can we think of a way to minimize this repeated processing?

We could have an in-memory list of recently processed emails with OpenPGP keys attached (e.g. 20 entries). Every time we go through processing of an email with OpenPGP keys attached, we could remember the message ID.

If we open an email with a message ID that's in that recently-used list, we skip all automatic processing of attached keys.
This could help when jumping between the same set of messages.

Flags: needinfo?(mkmelin+mozilla)

IIRC in one of my iterations I stored a date of last update. I think I should add that back, so we can just see if the key was collected very recently, say in the last day or so, there is no need to do further processing with it.

Flags: needinfo?(mkmelin+mozilla)

Magnus, I see an intermittent failure in my new test, where a newly added element added to CollectedKeysDB isn't visible in the test.

I think this can be caused by a missing transaction.commit() call. When calling commit(), we ensure that the element will be visible immediately to all newly transactions (and not be delayed until after automatic JS object cleanup and implicit commit calling).

I cannot reproduce the intermittent failure after adding commit().

I added the commit() statements now.

Pushed by kaie@kuix.de:
https://hg.mozilla.org/comm-central/rev/d3b9394d682a
collect OpenPGP keys from email attachments, Autocrypt headers and keyserver lookups. r=kaie
https://hg.mozilla.org/comm-central/rev/6b71d6648154
Don't collect OpenPGP key candidates that don't pass sanity checks. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/575874479590
Function mergePublicKeyBlocks, clarify comment and use const. r=PatrickBrunschwig

Regressions: 1763907
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: