Closed Bug 618496 Opened 9 years ago Closed 9 years ago

Revisit memoization of SECItems and PK11SymKeys

Categories

(Firefox :: Sync, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Future
Tracking Status
fennec 2.0+ ---

People

(Reporter: rnewman, Assigned: rnewman)

References

Details

Attachments

(4 files, 3 obsolete files)

Follow-on from Bug 610914.
See Also: → 610914
Target Milestone: --- → Future
I have some ideas about how to improve this too. I would like to discuss the design before the new implementation gets written.
(In reply to comment #1)
> I have some ideas about how to improve this too. I would like to discuss the
> design before the new implementation gets written.

You bet! Next week for sure.

With a little less time pressure we can definitely produce a better outcome than swapping in some memoization.
We're spending a lot of time GC'ing during sync on mobile. I suspect a large part is due to the fact that WeaveCrypto creates new objects on each operation. Brian, could you outline the ideas you had to mitigate this? We should consider doing this if we can come up with a low risk patch.
My idea is simply to create the nsIKeyObjects immediately after receiving them over the network and only release them at shutdown or when the sync key changes, instead of creating and releasing them for every record encrypted or decrypted. AFACT, the SECItem memoization isn't really necessary but maybe I don't understand some backward-compatibility requirement with the 3.6 client.
(In reply to comment #4)
> AFACT, the SECItem memoization isn't really necessary but maybe I
> don't understand some backward-compatibility requirement with the 3.6 client.

Backward-compatibility was my reason: I didn't want to change the definition of WeaveCrypto. Injecting memoization was just a minimally invasive way of creating key objects once and hanging on to them until shutdown.
The nsIKeyObject-based API doesn't support AES right now. But, a similar optimization can be implemented without changes in PSM by saving the PK11SymKey structure and passing that around instead of the string representation of the key.

But, it isn't obvious that this key construction/destruction is actually the main thing that is making Sync slow or memory-hungry, because keys are (or should be) constructed once per record decrypted and they aren't that big. I am more suspicious of overhead of the base64-encoded-javascript-string to SECItem conversion and other stuff that happens between the JSON being parsed and the records getting decrypted.
(In reply to comment #6)
> The nsIKeyObject-based API doesn't support AES right now. But, a similar
> optimization can be implemented without changes in PSM by saving the PK11SymKey
> structure and passing that around instead of the string representation of the
> key.

That would be an API change in WeaveCrypto. Avoiding that led me to memoization, which is really just a roundabout way of "passing around" the same PK11SymKey.

> But, it isn't obvious that this key construction/destruction is actually the
> main thing that is making Sync slow or memory-hungry, because keys are (or
> should be) constructed once per record decrypted and they aren't that big. I am
> more suspicious of overhead of the base64-encoded-javascript-string to SECItem
> conversion and other stuff that happens between the JSON being parsed and the
> records getting decrypted.

Memoizing the SECItem builder and the PK11SymKey lookup (the patch in Bug 610914) together saved something like 50% of the time spent in Sync's decryption and HMAC calls.

Most of that is the base64 nonsense, which was a big smoking gun in my memory allocation instrumentation. Lots of garbage there.
(In reply to comment #6)
> The nsIKeyObject-based API doesn't support AES right now.

Filed bug 636310 for that.

> But, it isn't obvious that this key construction/destruction is actually the
> main thing that is making Sync slow or memory-hungry, because keys are (or
> should be) constructed once per record decrypted and they aren't that big. I am
> more suspicious of overhead of the base64-encoded-javascript-string to SECItem
> conversion

ctypes not being very efficient on GC (bug 636300) could the culprit too.

> and other stuff that happens between the JSON being parsed and the
> records getting decrypted.

There isn't much that's going on there. Just the HMAC verification, really. https://mxr.mozilla.org/mozilla-central/source/services/sync/modules/record.js#223
Summary: Revisit crypto efficiency improvements → Revisit memoization of SECItems and PK11SymKeys
(In reply to comment #7)
> That would be an API change in WeaveCrypto. 

Actually, that's not strictly true. We could pass a String for the base64 key, and attach the PK11SymKey object to it, then check for it on the other side. Kinda messy, though.
We should be able to memoize symkey SECItems and possibly even PK11SymKeys. The IV SECItems we can simply allocate once and then reuse. This will save us a bunch (I estimate 10*N) of object allocations, thus nom'ing for blocking Fennec.
tracking-fennec: --- → ?
tracking-fennec: ? → 2.0+
Assignee: nobody → philipp
tracking-fennec: 2.0+ → ?
tracking-fennec: ? → 2.0+
Attached patch Part 1. v1 (obsolete) — Splinter Review
This patch:

* Splits up some of the makeSECItem logic into creation and filling
* Uses a single SECItem for all IVs, created when WeaveCrypto is initialized
* Adds a tiny test to make sure we get the same SECItem object on successive calls. (That the other crypto tests pass means that things still work!)
Assignee: philipp → rnewman
Status: NEW → ASSIGNED
Attachment #516132 - Flags: review?(philipp)
Attached patch Part 0. v1Splinter Review
Delete a whole bunch of WeaveCrypto stuff that we don't use. xpcshell-tests pass.
Attachment #516134 - Flags: review?(philipp)
Attached patch Part 0 the second, v1. (obsolete) — Splinter Review
Remove the ability to set algorithm.
Attachment #516153 - Flags: review?(philipp)
Attachment #516134 - Flags: review?(philipp) → review+
Attachment #516134 - Flags: approval2.0?
Better version of this.
Attachment #516153 - Attachment is obsolete: true
Attachment #516192 - Flags: review?(philipp)
Attachment #516153 - Flags: review?(philipp)
Attached patch Part 1. v2 (obsolete) — Splinter Review
Better version of this. Changes:

* Robust handling of writing into a SECItem of a different size: e.g., zero-padding. Tests for this.

* Test that memoization occurs.

* Build on top of prior simplification parts. (E.g., one less constant.)
Attachment #516132 - Attachment is obsolete: true
Attachment #516193 - Flags: review?(philipp)
Attachment #516132 - Flags: review?(philipp)
Comment on attachment 516192 [details] [diff] [review]
Part 0 the second, v2.

>+    /**
>+     * Set a bunch of NSS values once, at init-time. These are:
>+     *   - .blockSize
>+     *   - .mechanism
>+     *   - .keygenMechanism
>+     *   - .padMechanism
>+     *   - .keySize
>+     *
>+     * See also the constant ALGORITHM.
>+     */
>+    initAlgorithmSettings: function() {
>+        this.mechanism = this.nss.PK11_AlgtagToMechanism(ALGORITHM);
>+        this.blockSize = this.nss.PK11_GetBlockSize(this.mechanism, null);
>+        this.ivLength  = this.nss.PK11_GetIVLength(this.mechanism);
>+
>+        // Doesn't NSS have a lookup function to do this?
>+        switch(ALGORITHM) {

Derp? ALGORITHM is a const...

>+          case Ci.IWeaveCrypto.AES_128_CBC:
>+            this.keygenMechanism = this.nss.CKM_AES_KEY_GEN;
>+            this.keySize = 16;
>+            break;
>+
>+          case Ci.IWeaveCrypto.AES_192_CBC:
>+            this.keygenMechanism = this.nss.CKM_AES_KEY_GEN;
>+            this.keySize = 24;
>+            break;
>+
>+          case Ci.IWeaveCrypto.AES_256_CBC:
>+            this.keygenMechanism = this.nss.CKM_AES_KEY_GEN;
>+            this.keySize = 32;
>+            break;
>+
>+          default:
>+            throw Components.Exception("unknown algorithm", Cr.NS_ERROR_FAILURE);
>+        }

I think these lines can now reduce to

  this.keygenMechanism = this.nss.CKM_AES_KEY_GEN;  // this is always the same anyway!
  this.keySize = KEYSIZE_AES_256

where

  const KEYSIZE_AES_256 = 32;

at the top of the file. ALGORITHM seems superfluous at this point, too.

>+        // Determine which (padded) PKCS#11 mechanism to use.
>+        // E.g., AES_128_CBC --> CKM_AES_CBC --> CKM_AES_CBC_PAD

Nit: s/128/256/ :)
Attachment #516192 - Flags: review?(philipp) → review+
(In reply to comment #16)

> Derp? ALGORITHM is a const...

Derp indeed!

> >+        // Determine which (padded) PKCS#11 mechanism to use.
> >+        // E.g., AES_128_CBC --> CKM_AES_CBC --> CKM_AES_CBC_PAD
> 
> Nit: s/128/256/ :)

Heh, yes. Let's freshen up that comment :D
Comment on attachment 516193 [details] [diff] [review]
Part 1. v2

>     _commonCrypt : function (input, output, symmetricKey, iv, operation) {
>         this.log("_commonCrypt() called");
>         // Get rid of the base64 encoding and convert to SECItems.
>         let keyItem = this.makeSECItem(symmetricKey, true);
>-        let ivItem  = this.makeSECItem(iv, true);
>+        let ivItem  = this.filledIVSECItem(iv, true);

I think the memoization would be clearer to the reader if we inlined filledIVSECItem here and did s/ivItem/this._ivSECItem/ throughout the rest of the function.

>+    /**
>+     * Compress a JS string into a C uint8 array. count is the number of
>+     * elements in the destination array. If the array is smaller than the
>+     * string, the string is effectively truncated. If the string is smaller
>+     * than the array, the array is 0-padded.
>+     */
>+    byteCompressInts : function byteCompressInts (jsString, intArray, count) {
>+        let len = jsString.length;
>+        let end = Math.min(len, count);
>+
>+        for (let i = 0; i < end; i++)
>+            intArray[i] = jsString.charCodeAt(i) % 256; // convert to bytes
>+
>+        if (len < count) {
>+          // Must zero-pad.
>+          for (let i = len; i < count; i++)
>+            intArray[i] = 0;
>+        }

That if() is superflous as the for() loop already makes that check.

>     // Compress a JS string (2-byte chars) into a normal C string (1-byte chars)
>     // EG, for "ABC",  0x0041, 0x0042, 0x0043 --> 0x41, 0x42, 0x43
>     byteCompress : function (jsString, charArray) {
>-        let intArray = ctypes.cast(charArray, ctypes.uint8_t.array(charArray.length));
>-        for (let i = 0; i < jsString.length; i++)
>-            intArray[i] = jsString.charCodeAt(i) % 256; // convert to bytes
>+        let len = charArray.length;
>+        let intArray = ctypes.cast(charArray, ctypes.uint8_t.array(len));
>+        this.byteCompressInts(jsString, intArray, len);
>     },

byteCompress doesn't seem to be used anymore, please remove.

>     // Expand a normal C string (1-byte chars) into a JS string (2-byte chars)
>     // EG, for "ABC",  0x41, 0x42, 0x43 --> 0x0041, 0x0042, 0x0043
>     byteExpand : function (charArray) {
>         let expanded = "";
>         let len = charArray.length;
>         let intData = ctypes.cast(charArray, ctypes.uint8_t.array(len));
>@@ -569,39 +590,92 @@ WeaveCrypto.prototype = {
>             expanded += String.fromCharCode(intData[i]);
>       return expanded;
>     },
> 
>     encodeBase64 : function (data, len) {
>         return btoa(this.expandData(data, len));
>     },
> 
>+    // Fill an existing SECItem. Pass destBuffer, which should be the
>+    // uint8_t.array.ptr contents of the item's buffer, or null (and it will be
>+    // derived from the item).
>+    // The SECItem will be returned.
>+    fillSECItem : function fillSECItem(item, itemSize, destBuffer, input) {
>+        let dest = destBuffer;
>+        if (!dest) {
>+            let ptr = ctypes.cast(item.contents.data, ctypes.unsigned_char.array(itemSize).ptr);
>+            dest = ctypes.cast(ptr.contents, ctypes.uint8_t.array(itemSize));
>+        }
>+
>+        this.byteCompressInts(input, dest, itemSize);
>+        return item;
>+    },

I don't see the need for this method. We call it only in two distinct places, once with destBuffer and once without. It seems it'd be better to just inline that. In the former case you'd be inlining just the call to this.byteCompress, in the other you'd be inlining the whole thing but that's ok since that's the only case where the if (!dest) block is invoked anyway.

>+  // Test that IV SECItem memoization really does memoize!
>+  let iv1 = cryptoSvc.generateRandomIV();
>+  let iv2 = cryptoSvc.generateRandomIV();
>+  let ivSECItem1 = cryptoSvc.filledIVSECItem(iv1);
>+  let ivSECItem2 = cryptoSvc.filledIVSECItem(iv2);
>+  
>+  // Same object.
>+  do_check_eq(ivSECItem1, ivSECItem2);
> }

Superfluous after inlining filledIVSECItem and directly using this._ivSECItem in commonCrypt.


To summarize, please
* inline filledSECItem
* inline fillSECItem
* remove byteCompress
* remove superfluous if() in byteCompressInts

r- simply because I'd like to rubberstamp the updated patch before it goes out. Gotta be cautious with changes to crypto.
Attachment #516193 - Flags: review?(philipp) → review-
(In reply to comment #19)

> To summarize, please
> * inline filledSECItem
> * inline fillSECItem

Done. Note that this alters tests; see patch to follow. 

> * remove byteCompress

Still used by decrypt, so leaving this for now.

> * remove superfluous if() in byteCompressInts

Done.
Attached patch Part 1. v3Splinter Review
This should address your comments, Philipp. Tests pass on 3.5.16 and Minefield.
Attachment #516193 - Attachment is obsolete: true
Attachment #516337 - Flags: review?(philipp)
Attached patch Part 2. v1Splinter Review
OK, here's a run at memoizing symkeys.

* Eliminate a const that we don't use.
* Tidy up some whitespace from previous patches in this series. (Sorry.)
* Refactor code into importSymKey, which implements memoization.
* Tests for memoization and the non-memo case.
Attachment #516350 - Flags: review?(philipp)
Comment on attachment 516337 [details] [diff] [review]
Part 1. v3

(In reply to comment #20)
> > * remove byteCompress
> 
> Still used by decrypt, so leaving this for now.

So you're saying byteCompress is no only used by decrypt()? Sounds like a good reason for inlining to me! :)

r=me with that
Attachment #516337 - Flags: review?(philipp) → review+
Comment on attachment 516350 [details] [diff] [review]
Part 2. v1

>-            if (symKey && !symKey.isNull())
>+            if (symKey && symKey.shouldFree && !symKey.isNull())
>                 this.nss.PK11_FreeSymKey(symKey);

These lines can go completely (see below).

>+            // No memo? Then you should free the key we give you.
>+            if (memo)
>+                memo[encodedKeyString] = symKey;
>+            else
>+                symKey.shouldFree = true;

'memo' would not exist if 'operation' had a value that we don't support. So rather than adding a bunch of code that makes cases we don't care about look supported (but that isn't tested!), how about we ensure that 'operation' is one of [CKA_ENCRYPT, CKA_DECRYPT] in _commonCrypt and throw an exception if it isn't.

r=me with that
Attachment #516350 - Flags: review?(philipp) → review+
Whiteboard: [fixed in fx-sync]
Note for posterity: it would be simple to hook into CollectionKeys to invalidate the memo when keys change. However, the number of keys is so small that this is unlikely to be worth the effort.
Merged to m-c:

Part 0a: http://hg.mozilla.org/mozilla-central/rev/aa2115807bde
Part 0b: http://hg.mozilla.org/mozilla-central/rev/ab322645172b
Part 1: http://hg.mozilla.org/mozilla-central/rev/0bea44f90d6d
Part 2: http://hg.mozilla.org/mozilla-central/rev/e88c9b6c4ebd
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed in fx-sync][fixed in services]
Component: Firefox Sync: Crypto → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.