Closed Bug 617709 Opened 9 years ago Closed 9 years ago

fix handling of bulk key changes

Categories

(Firefox :: Sync, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta8+
fennec 2.0b3+ ---

People

(Reporter: rnewman, Assigned: rnewman)

References

Details

Attachments

(4 files)

Bug to handle work for report in Bug 617268.

Right now, that means:

* More thorough tests around downloading keys when they change
* Ensuring that data are reuploaded when we find that keys have changed

and

* A crazy failure-recovery scheme: if an HMAC mismatch is detected, download keys (which might recover); if we already have those keys, _re-upload them_ to bump the modified time on the server. Other clients should then re-download keys and do the right thing.
Flagging for blocker.
blocking2.0: --- → ?
tracking-fennec: --- → ?
blocking beta 8 until I can get to the bottom of all this.
blocking2.0: ? → beta8+
Duplicate of this bug: 617268
This is less "improve" and more "fix brokenness from simplecrypto"

What's the ETA here?
Summary: Improve handling of bulk key changes → fix handling of bulk key changes
(In reply to comment #4)
> This is less "improve" and more "fix brokenness from simplecrypto"

That's one kind of improvement :D

> What's the ETA here?

Making good progress on the first two points: automated, crossweave, and manual testing shows things to be robust now, even under manual switching out of keys (for which there's currently no UI, so it's not a supported scenario). 

If a final test pass doesn't show problems, I'll roll up a patch shortly.

When you're happy with that (and I've eaten), I'll attack HMAC error recovery.
tracking-fennec: ? → 2.0b3+
Attached patch Fix tests, v1Splinter Review
The disabled tests were broken by these changes. Fix.
Attachment #496421 - Flags: review?(mconnor)
Soon we'll need to be more aware of these, so let's not have magic strings in the code.
Attachment #496423 - Flags: review?(mconnor)
Comment on attachment 496398 [details] [diff] [review]
Download and reset when server keys change, v1

I love when we add more tests than code.  Fix the comment ,as noted on IRC, and we're ok here.
Attachment #496398 - Flags: review?(mconnor) → review+
Attachment #496421 - Flags: review?(mconnor) → review+
Comment on attachment 496423 [details] [diff] [review]
Refactoring HMAC error throwing, v1

3% less magic* by volume!

>+  // Generator and discriminator for HMAC exceptions.
>+  // Split these out in case we want to make them richer in future, and to 
>+  // avoid inevitable confusion if the message changes.
>+  throwHMACMismatch: function throwHMACMismatch(is, shouldBe) {
>+    throw "Record SHA256 HMAC mismatch: should be " + shouldBe + ", is " + is;
>+  },

minor cognitive dissonance: you're passing a, b into a helper that returns b, a in a string.  That feels like a sharp edge we'll cut ourselves on someday.  (variable inversions are bad!)

* value is approximate.
Attachment #496423 - Flags: review?(mconnor) → review+
To clarify, that means remaining work in this bug is:

* A crazy failure-recovery scheme: if an HMAC mismatch is detected, download
keys (which might recover); if we already have those keys, _re-upload them_ to
bump the modified time on the server. Other clients should then re-download
keys and do the right thing.

Landed work can be merged safely.
The final part of this bug.

This patch implements a HMAC-repair callback, which does two things:

* If our keys are corrupt, or the server timestamp in info/collections is wrong, this will replace them mid-sync and try decryption again.

* If another client has bad keys, this will cause the server keys to be 'touched', which should prompt that client to refresh their own keys and re-upload their data.

There is one thing I might have omitted, depending on what you guys think. From looking at the code, we do not refuse to update our keys based on the timestamp in the keys WBO that we retrieve. This might be a good thing (clock drift, for example), or a bad thing (if we failed to upload new keys).

I'm inlined to call it not a big deal, but let me know if you disagree.
Attachment #496742 - Flags: review?(philipp)
richard, can you walk us through what we test scenarios that QA should pay attention to when this lands?  if we'll need to find an assortment of accounts with different sync keys, just call that out.   Thanks.
(In reply to comment #14)
> richard, can you walk us through what we test scenarios that QA should pay
> attention to when this lands?  if we'll need to find an assortment of accounts
> with different sync keys, just call that out.   Thanks.

Kinda overlaps with other stuff, but:

Profile upgrades:
* v3 storage version (sync 1.5.x) => now
* v4 storage version (nightlies since Dec 1, 1.6b?) => now

- When one device upgrades, it should upgrade v3 passphrases/sync keys to 26-char alphadigit keys, visible in Prefs > Sync > My Sync Key. Should look like

   a-bbbbb-ccccc-ddddd-eeeee-fffff

- Syncing to the server should work (no error).

- If you enable logging prior to upgrades, you should see no untoward errors in about:sync-log. See 

  http://philikon.wordpress.com/2010/11/12/sync-in-firefox-4-0-beta-7/

for how to enable logging.  

- Other devices connected to the same sync account should see a "you need to upgrade" error on their next sync.

- When they upgrade, sync should work fine.

New device:
* Set up a desktop mobile device to sync with:
  * A new signup
  * An existing account that started as v3
  * An existing account that started as v4

- You'll need to enter the sync key on the device. Please test both entering the sync key from My Sync Key, and also entering the original v3 passphrase -- it should be seamlessly upgraded and sync should work.


Key changes:
* Prefs > Sync > My Sync Key > Generate. Sync Now. Should not fail.
  - Other devices should see "Wrong Sync Key", and offer UI to update it.


There is no UI method to get into a screwed-up state that you can test, I'm afraid. If you have a Sync server that you can break, I can provide some direction for scenarios that should be recoverable by the client.

Sync devs: anything obvious I've missed?
Comment on attachment 496742 [details] [diff] [review]
Tests and functionality for on-the-fly HMAC error recovery. v1

>+  addHMACHandler: function addHMACHandler(handler) {
>+    this._log.debug("Adding HMAC error handler to engines.");
>+    for (let engine in this._engines) {
>+      let e = this._engines[engine];

Nit: use Iterator here when iterating over both key and value, and use more descriptive names, e.g.:

 for (let [name, engine] in Iterator(this._engines)) {

Though I'm wondering whether that extra bit of indirection is really worth it. We could just define a handleHMACMismatch method on the SyncEngine.prototype and have that call Service.handleHMACMismatch (get round circular imports by importing main.js and accessing Services via Weave.Service).

> function SyncEngine(name) {
>   Engine.call(this, name || "SyncEngine");
>+  
>+  this.handleHMACMismatch = null;
> }

Add this to the prototype please, or better just do what I suggested above :)

>       try {
>-        item.decrypt();
>+        try {
>+          item.decrypt();
>+        } catch (ex) {
>+          if (Utils.isHMACMismatch(ex) &&
>+              this.handleHMACMismatch &&
>+              this.handleHMACMismatch()) {
>+            // Let's try handling it.
>+            // If the callback returns true, try decrypting again, because
>+            // we've got new keys.
>+            this._log.info("Trying decrypt again...");
>+            item.decrypt();
>+          }
>+          else {
>+            throw ex;
>+          }
>+        }

Nit: Cuddle the else

>+  
>+  /**
>+   * Here is a disgusting yet reasonable way of handling HMAC errors deep in
>+   * the guts of Sync. The astute reader will note that this is a hacky way of
>+   * implementing something like continuable conditions.
>+   * 
>+   * A handler function is glued to each engine. If the engine discovers an
>+   * HMAC failure, we fetch keys from the server. If the modified time of the
>+   * keys WBO is appropriate, we update our keys, just as we would on startup.
>+   * 
>+   * If our key collection changed, we signal to the engine (via our return
>+   * value) that it should retry decryption.
>+   * 
>+   * If our key collection did not change, it means that we already had the
>+   * correct keys... and thus a different client has the wrong ones. Reupload
>+   * the bundle that we fetched, which will bump the modified time on the
>+   * server and (we hope) prompt a broken client to fix itself.
>+   * 
>+   * We keep track of the time at which we last applied this reasoning, 
>+   */

Excellent description. Need more of those :)

>+  handleHMACEvent: function handleHMACEvent() {
>+    let now = Date.now();
>+    
>+    // No less than 30 minutes between HMAC recovery attempts. This gives us

constants.js said 10 minutes...

>+      catch (ex) {
>+        this._log.warn("Got exception \"" + ex +
>+            "\" fetching and handling crypto keys. Will try again later.");

Nit: alignment (and cuddled elses/catches, overlong lines, trailing whitespaces in that function)

>+    // Add HMAC handling.
>+    let svc = this;
>+    Engines.addHMACHandler(function () svc.handleHMACEvent());

  Engines.addHMACHandler(Utils.bind2(this, this.handleHMACEvent));

(ES5 has Function.bind which we can start using once we require Firefox >=4)

>+    Svc.Prefs.set("registerEngines", "Tab");
>+    _("Set up some tabs.");
>+    let myTabs = 
>+      {windows: [{tabs: [{index: 1,
>+                          entries: [{
>+                            url: "http://foo.com/",
>+                            title: "Title"
>+                          }],
>+                          attributes: {
>+                            image: "image"
>+                          },
>+                          extData: {
>+                            weaveLastUsed: 1
>+                          }}]}]};
>+    delete Svc.Session;
>+    Svc.Session = {
>+      getBrowserState: function () JSON.stringify(myTabs)
>+    };

Seems a bit excessive to have all engines enabled for this test... History would do just fine. Even that worries me slightly because this engine-agnostic test now depends on the history WBO format.

>+    Engines.register(HistoryEngine);
>+    Weave.Service._registerEngines();

If you do the second, do you still need the first?
Attachment #496742 - Flags: review?(philipp) → review+
(In reply to comment #16)

> Though I'm wondering whether that extra bit of indirection is really worth it.
> We could just define a handleHMACMismatch method on the SyncEngine.prototype
> and have that call Service.handleHMACMismatch (get round circular imports by
> importing main.js and accessing Services via Weave.Service).

Done.

> Nit: Cuddle the else

I stylistically reworked this entire section. Much better now. Return early, return often :)

> Seems a bit excessive to have all engines enabled for this test... History
> would do just fine. Even that worries me slightly because this engine-agnostic
> test now depends on the history WBO format.

On the other hand, it's nice that we now have tests that break if the format changes, or isn't preserved by some component...

But yes, I understand your concern.

> >+    Engines.register(HistoryEngine);
> >+    Weave.Service._registerEngines();
> 
> If you do the second, do you still need the first?

Sadly, yes; removing either causes tests to fail. The only thing I was able to get rid of was the third line, enabling the history engine.
Landed:

http://hg.mozilla.org/services/fx-sync/rev/5f3dde9c368e

Please give that a quick scan, Philipp.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.