Closed Bug 1350196 (CVE-2017-5457) Opened 3 years ago Closed 3 years ago

Sync clients uploading records across all collections with weak IV.

Categories

(Firefox for Android :: Android Sync, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 55
Tracking Status
firefox53 --- fixed
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: bobm, Assigned: Grisha)

References

Details

(Keywords: sec-high, Whiteboard: [adv-main53+])

Attachments

(3 files, 4 obsolete files)

Some sync clients are uploading records across collections, including crypto, with an effective IV of zero (AAAAAAAAAAAAAAAAAAAAAA==).  The user agent string for these users is largely, but not entirely, Android.  Some 13k users across the production service have forms records in this state, and at least 9k have /crypto/keys records in the same.
Short-term, we worked on a patch to the server to reject any BSOs that have this all-zero IV:

  https://github.com/mozilla-services/server-syncstorage-private/pull/1

I'm pretty sure there's no good that can come of letting those records be stored successfully, and this should help us get easier/better metrics on what clients are doing this and why.  That code is available for deploy in a private 1.6.5 release:

  https://github.com/mozilla-services/server-syncstorage-private/releases/tag/1.6.5
(In reply to Ryan Kelly [:rfkelly] from comment #1)
This seems like a good strategy to me.  We can test and deploy this fix, and then migrate all users with bad IVs and avoid having them re-upload these records.  I'll get a stage deploy of this code out now.
(In reply to Ryan Kelly [:rfkelly] from comment #1)
 
> https://github.com/mozilla-services/server-syncstorage-private/releases/tag/
> 1.6.5

This has been deployed to stage nodes 2, 3, and 4. (https://sync-[234]-us-east-1.stage.mozaws.net)
Sanity-check: there's no reason we should ever see an all-zero "IV" field in a payload right?  I can't think of one.
Flags: needinfo?(rnewman)
(In reply to Bob Micheletto [:bobm] from comment #3)

> This has been deployed to stage nodes 2, 3, and 4.
> (https://sync-[234]-us-east-1.stage.mozaws.net)

These pass the updated 1.6.5 API verification tests.
(In reply to Bob Micheletto [:bobm] from comment #5)
> (In reply to Bob Micheletto [:bobm] from comment #3)
Deployed to the following nodes initially as a test:
syn-490-prod
syn-435-prod
syn-469-prod
syn-476-prod
syn-463-prod
syn-445-prod

If all goes well over the next hour or thereabouts this will be deployed to the remaining production servers for all due expediency.
> The user agent string for these users is largely, but not entirely, Android.

I'd be very interested to know if we can catch any desktop clients in the act of uploading these apparently-bogus-crypto records.
(In reply to Ryan Kelly [:rfkelly] from comment #4)
> Sanity-check: there's no reason we should ever see an all-zero "IV" field in
> a payload right?  I can't think of one.

I don't think so.

As far as I can tell, we don't specify IV ourselves - but we absolutely can do that - and instead let system's cipher implementation generate one. I think this is where that happens in some older Android version:
https://android.googlesource.com/platform/libcore/+/android-4.4.2_r1/crypto/src/main/java/org/conscrypt/OpenSSLCipher.java#272

Which uses SecureRandom to generate the vector. Looking around briefly, it seems that particular class has a history of issues :/
BTW, predictable IVs are bad, but not catastrophic for our system IIUC.  They just happen to be an easily observable symptom of potential randomness issues.  The thing I'm really worried about is if these same clients had issues with generating randomness for their key bundles here:

  https://dxr.mozilla.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/sync/crypto/KeyBundle.java?#98

The documentation for the KeyGenerator class used there says "If this key generator requires any random bytes, it will get them using the SecureRandom implementation of the highest-priority installed provider as the source of randomness. (If none of the installed providers supply an implementation of SecureRandom, a system-provided source of randomness will be used.)".  This is the same verbiage used to describe how Cipher objects generate an IV.

So issues in IV generate do seem like they may potentially indicate issue in key generation as well :-(
> Which uses SecureRandom to generate the vector. Looking around briefly, it seems that particular class has a history of issues

Indeed; articles like this do not leave me with a good feeling:

  https://android-developers.googleblog.com/2013/08/some-securerandom-thoughts.html
Interestingly, we appear to ship the PRNGFixes mentioned in that above article:

  https://dxr.mozilla.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/util/PRNGFixes.java

But I cannot establish whether they're applied correctly.  The only file that seems to reference them is this one that's related to browserid:

  https://dxr.mozilla.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/browserid/DSACryptoImplementation.java

Which also contains this delightful error message:

  https://dxr.mozilla.org/mozilla-central/source/mobile/android/services/src/main/java/org/mozilla/gecko/browserid/DSACryptoImplementation.java#141

  "Cryptographic data produced on this device may be weak.  Ignoring."
Bug 1173229 is where PRMGFixes were added (also see Bug 1227342, which we should probably get to at some point soon as well :/). However, it seems like they had a significant enough startup penalty, and so were only applied within DSA key generator.
(In reply to Bob Micheletto [:bobm] from comment #6)
 
> If all goes well over the next hour or thereabouts this will be deployed to
> the remaining production servers for all due expediency.

This version has been pushed to the Sync production fleet.
(In reply to Ryan Kelly [:rfkelly] from comment #9)
> BTW, predictable IVs are bad, but not catastrophic for our system IIUC. 
> They just happen to be an easily observable symptom of potential randomness
> issues.  The thing I'm really worried about is if these same clients had
> issues with generating randomness for their key bundles here:

They specifically call out JCA's KeyGenerator in that android blogpost:

"Developers who use JCA for key generation, signing or random number generation should update their applications to explicitly initialize the PRNG with entropy from /dev/urandom or /dev/random. Also, developers should evaluate whether to regenerate cryptographic keys or other random values previously generated using JCA APIs such as SecureRandom, KeyGenerator, KeyPairGenerator, KeyAgreement, and Signature."

We use KeyGenerator to get our crypto keys.

So... am I wrong in thinking that this is rather bad? I think Android versions<=4.3 (play store should give us install numbers) might be affected (although I'm not sure, version is hard to pinpoint somehow). If we determine that the keys are indeed quite possibly weak if generated on those devices, what are the actions? Shipping PRNGFixes more broadly ASAP (or at least so that crypto keys are covered), and eventually node-reassigning users..?
> am I wrong in thinking that this is rather bad?

I'm getting a rather bad feeling here too.  We can block users with *incredibly bad* randomness that produces an all-zero IV, but if there's a bunch of android devices out there generating random-looking-but-weak keys then this is a much broader issue that's harder to resolve.

It would be great to get Nick and/or Richard's input on the issue here, given previous experience with the PRNGFixes in Bug 1173229.

> Shipping PRNGFixes more broadly ASAP (or at least so that crypto keys are covered), and eventually node-reassigning users..?

This sounds reasonable.  But after node migration, is there a way for us to stop affected users from just starting again with weak keys?  I'm not sure if there's any way for us to identify the Android version a user is running based on the data that's available to sync, because the UA string doesn't appear to include platform info :-(

We also haven't yet ruled out similar issues affecting Desktop, IIUC we'll need to get Bob's analysis of user-agents in order to dig in further on that front.
Flags: needinfo?(nalexander)
And I'm don't think we can confidently determine upper boundary for # of affected users. If anything, install numbers of affected android version will give us a lower boundary. If user doesn't change their password and we don't node-reassign them (although, we just did that for everyone! yay?), their weak keys will persist beyond the device that generated them.
(Without wanting to balloon the CC list here, adding more FxA+Sync product folks for visibility)
> (although, we just did that for everyone! yay?)

At the very least, this means we can confidently identify the subset of users who created /crypto/keys from an Android device.
We already addressed SecureRandom on Android: Bug 903907.
(In reply to Richard Newman [:rnewman] from comment #19)
> We already addressed SecureRandom on Android: Bug 903907.

Further to Comment 8 and Comment 14, see specifically Bug 903907 Comment 5, where I wrote (four years ago!):

> Note that there's one more place that system entropy presumably
> enters the system: when we encrypt a record we allow
> javax.crypto.Cipher to provide a random IV. We don't generate our own.

and mfinkle's note from Google.
Flags: needinfo?(rnewman)
Since 1.6.5 has been running we've rejected bad BSO IVs from the following UAs by count:
   4947 Firefox AndroidSync 1.52.0.1.0 (Firefox)
    516 Firefox AndroidSync 1.51.0.3.0 (Firefox)
    418 Firefox AndroidSync 1.52.0.0 (Firefox)
    278 Firefox AndroidSync 1.50.1.0.0 (Firefox)
    201 Firefox AndroidSync 1.49.0.2.0 (Firefox)
    189 Firefox AndroidSync 1.53.0.0 (Firefox)
    139 Firefox AndroidSync 1.48.0.0 (Firefox)
    106 Firefox AndroidSync 1.47.0.0 (Firefox)
     92 Firefox AndroidSync 1.38.0.5.0 (Firefox)
     92 Firefox AndroidSync 1.34.0.1.0 (Firefox)
     86 Firefox AndroidSync 1.50.0.0 (Firefox)
     75 Firefox AndroidSync 1.45.0.1.0 (Firefox)
     69 Firefox AndroidSync 1.42.0.2.0 (Firefox)
     60 Firefox AndroidSync 1.49.0.0 (Firefox)
     58 Firefox AndroidSync 1.46.0.1.0 (Firefox)
     57 Firefox AndroidSync 1.44.0.2.0 (Firefox)
     54 Firefox AndroidSync 1.41.0.2.0 (Firefox)
     54 Firefox AndroidSync 1.40.0.3.0 (Firefox)
     46 Firefox AndroidSync 1.51.0.0 (Firefox)
     46 Firefox AndroidSync 1.50.0.2.0 (Firefox)
     41 Firefox AndroidSync 1.43.0.0 (Firefox)
     33 Firefox AndroidSync 1.54.0a2.0 (Firefox)
     31 Firefox AndroidSync 1.44.0.0 (Firefox)
     26 Firefox AndroidSync 1.51.0.4.0 (Firefox)
     25 Firefox AndroidSync 1.46.0.0 (Firefox)
     25 Firefox AndroidSync 1.36.0.2.0 (Firefox)
     21 Firefox AndroidSync 1.42.0.0 (Firefox)
     20 Firefox AndroidSync 1.45.0.0 (Firefox)
     17 Firefox AndroidSync 1.51.0.2.0 (Firefox)
     14 Firefox AndroidSync 1.37.0.2.0 (Firefox)
     13 Firefox AndroidSync 1.42.0.1.0 (Firefox)
     12 Firefox AndroidSync 1.40.0.0 (Firefox)
     10 Firefox AndroidSync 1.39.0.0 (Firefox)
      8 Firefox AndroidSync 1.41.0.0 (Firefox)
      6 Firefox AndroidSync 1.29.0.1.0 (Firefox)
      6 Firefox AndroidSync 1.29.0.0 (Firefox)
      6 Firefox AndroidSync 1.1.0.0.0 (Adblock Browser)
      4 Firefox AndroidSync 1.34.0.0 (Firefox)
      4 Firefox AndroidSync 1.1.1.1.0 (Adblock Browser)
      3 Firefox AndroidSync 1.45.0.2.0 (Firefox)
      3 Firefox AndroidSync 1.38.0.0 (Firefox Beta)
      3 Firefox AndroidSync 1.36.0.4.0 (Firefox)
      3 Firefox AndroidSync 1.35.0.1.0 (Firefox)
      3 Firefox AndroidSync 1.35.0.0 (Firefox)
      3 Firefox AndroidSync 1.33.0.0 (Firefox)
      2 Firefox AndroidSync 1.45.5.1.0 (Firefox)
      2 Firefox AndroidSync 1.32.0.1.0 (Firefox)
      1 Firefox AndroidSync 1.55.0a1.0 (Firefox)
      1 Firefox AndroidSync 1.34.0.0 (FirePhoenix)
(In reply to Richard Newman [:rnewman] from comment #19)
> We already addressed SecureRandom on Android: Bug 903907.

Correct me if I'm wrong, but it seems to me that neither Bug 903907 nor Bug 1173229 help address underlying issue for older, unpatched devices when it comes to generating our encryption & hmac keys for the crypto/keys collection?
On the gosync node 399 users and ~197K BSOs have an IV:A+.
(In reply to :Grisha Kruglov from comment #22)
> (In reply to Richard Newman [:rnewman] from comment #19)
> > We already addressed SecureRandom on Android: Bug 903907.
> 
> Correct me if I'm wrong, but it seems to me that neither Bug 903907 nor Bug
> 1173229 help address underlying issue for older, unpatched devices when it
> comes to generating our encryption & hmac keys for the crypto/keys
> collection?

This is correct.  This was my error; I audited our use of SecureRandom in Bug 903907 and addressed the narrow use of SecureRandom flagged by our linter in Bug 1173229.  The fix applied in Bug 1173229 is very slow to apply at runtime, so we did not do it as a "blanket"; we inserted it into the one code path that I identified as being vulnerable.

Unfortunately, I didn't catch our use of _other_ parts of the Java crypto stack that are weak [1], and since we didn't apply the fix as a blanket, codepaths using these other parts of the Java crypto stack remain vulnerable.  In particular, we don't rotate certain FxA-related keys [2], so we don't get even partial coverage from the fix after the initial FxA handshake has completed.

What this means is that the following flow is the expected flow on weak Android versions:

At time t0:
- login to FxA, entering Engaged state
- install PRNG fix before generating FxA-related key securely

Eventually reboot the device.

At time t1:
- auth to token server using secure FxA-related key
- (potentially) generate crypto/keys insecurely
- (almost guaranteed) generate IVs for encrypting individual records insecurely

[1] To be clear: Android versions 16, 17, and 18 have weak SecureRandom PRNGs.

[2] Tracked by https://bugzilla.mozilla.org/show_bug.cgi?id=1227342.
Flags: needinfo?(nalexander)
Mitigation:

* To prevent this problem for future new users, we should force the PRNG fix when we enter the FxAccountSyncAdapter, swallowing the performance hit that may be visible in the main App interface.

* We expect almost all Android users to have generated weak crypto/keys.  We'll need to node re-assign them any Sync constellation that includes an Android device.

* We expect almost all Android users to have chosen poorly distributed GUIDs.  I haven't a clue what happens if GUIDs are not distinct, but it might explain some small proportion of our mysterious bookmark failures.  I think that changing all existing GUIDs is likely to expose significant issues: any one device can rewrite it's GUID table, but then we have to trust _every other device_ to correctly unify two records with different GUIDs.  I expect that would shake out some bugs in our implementations.

* There's no particular added urgency to rotating the FxA-related key (Bug 1227342): we don't think those keys are weak (since the PRNG fix should have been in place before they were generated) and even had we been rotating them, the fix would not have been in place frequently enough to have prevented this issue.
Some effects of the fix on comment #1: 

Before fix

[2017-03-23T21:28:50+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 200 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-23T21:32:43+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 200 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
... (omit successful requests)

< 400 IV=A+ fix > ...

[2017-03-24T11:34:02+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-24T12:04:03+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
... (omit 400 responses)
[2017-03-24T15:19:18+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-24T15:27:27+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"

once you have a bad IV, it won't recover.

Other collections are broken as well: 

[2017-03-24T17:48:53+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-24T17:48:53+00:00] "POST /1.5/63962511/storage/forms HTTP/1.1"   400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-24T17:56:52+00:00] "POST /1.5/63962511/storage/tabs HTTP/1.1"    400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-24T17:56:55+00:00] "POST /1.5/63962511/storage/history HTTP/1.1" 400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-24T17:56:55+00:00] "POST /1.5/63962511/storage/forms HTTP/1.1"   400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"
[2017-03-24T17:59:52+00:00] "POST /1.5/63962511/storage/tabs HTTP/1.1"    400 "Firefox AndroidSync 1.44.0.2.0 (Firefox)"

It seems that if a user is in this state, the fix from comment #1 will essentially disable their sync.
(In reply to Nick Alexander :nalexander from comment #25)

> * To prevent this problem for future new users, we should force the PRNG fix
> when we enter the FxAccountSyncAdapter, swallowing the performance hit that
> may be visible in the main App interface.

And before generating GUIDs (which uses SecureRandom).
> * We expect almost all Android users to have generated weak crypto/keys.  We'll need to node re-assign them any Sync constellation that includes an Android device.

According to bobm's analysis in comment 0, this doesn't seem to be *all* android users, but only a subset.

> Some 13k users across the production service have forms records in this state, and at least 9k have /crypto/keys records in the same.

Possibly the ones that synced from android first, either after enrollment or following a node migration. Those users may have an encryption key set to "AAAAAAAAAAAAAAAAAAAAAA==", same as their IV. In which case, their sync records would be trivially decryptable. I'm working on verifying that assumption.
AAAAAAAAAAAAAAAAAAAAAA== converts to a 16 bytes of zeros. In case anybody wonders why it's all A's.
> In which case, their sync records would be trivially decryptable. I'm working on verifying that assumption.

Impacted users DO NOT have weak encryption keys, so it would appear that only the IVs are impacted. We verified this in two ways:

First, Grisha manually enrolled a sync client on android, which clearly shows the IVs being set to 16 zero bytes, but the encryption and hmac keys being random.

> I/FxAccounts(18268): fennec_grisha :: GlobalSession :: Uploaded new meta/global with sync ID uLh8lspiDmdb.
> I/IVLOG-CRYPTOKEYS(18268): encryption key: bigPUxaJtEeV1FONIUxxMy8iboNOz2Wes/rRmNjGlZg=
> I/IVLOG-CRYPTOKEYS(18268): hmac key: 9nAeY5J40HJSnHnCWDiKrn0LtZyBfM931mSz5VegNKA=
> I/FxAccounts(18268): fennec_grisha :: GlobalSession :: Uploading new crypto/keys.
> I/IVLOG   (18268): AAAAAAAAAAAAAAAAAAAAAA==
> I/FxAccounts(18268): fennec_grisha :: GRIGRI :: PUT https://sync-495-us-west-2.sync.services.mozilla.com/1.5/73189323/storage/crypto/keys HTTP/1.1 : HTTP/1.1 400 Bad Request

Second, I tried decrypting various collections from an impacted user using a 32 bytes zero key and a 16 bytes zero IV in AES-256-CBC, which failed, thus indicating the records were encrypted with a different key.
(In reply to Julien Vehent [:ulfr] from comment #30)
> Impacted users DO NOT have weak encryption keys, so it would appear that only the IVs are impacted.

I'm not sure we can say with confidence that randomly looking encryption/hmac keys are actually as strong as we'd like in this case.
We ran an entropy test [1] on encryption and hmac keys generated before and after the PRNG fix, and both have good entropy, very close to 8 bits per bytes, and similar to what /dev/urandom produces.

I'm not a cryptographer, but it looks like these keys are good enough.

[1] http://www.fourmilab.ch/random/
What I take from Comment 26 is that if a user generates all-zero IVs, they do it for every sync record.  Such clients will currently be unable to upload any data to sync.  I'm going to iterate the fix to *only* block /crypto/keys records with a bad IV, rather than all records.  This will help guard against clients uploading potentially weak keys, while allowing them to sync properly using keys generated by another device.  My understanding is that using known IVs, or re-using IVs, is not terribly harmful to the cryptosystem here because it's using CBC mode, so allowing them to sync with all-zero IVs is acceptable.
> We ran an entropy test [1] on encryption and hmac keys generated before and after the PRNG fix

For completeness: on what version of Android, and is it possible that we would have observed a different outcome on a sufficiently older device?
Flags: needinfo?(gkruglov)
Attached patch prngfixes.patch (obsolete) — Splinter Review
First stab at the patch for this. It does two things:
- generates our own IVs
- moves PRNGFixes call to be as early as possible so that it covers GUID generation, FxA, Sync, etc, eating the startup penalty on old devices
Assignee: nobody → gkruglov
Attachment #8851136 - Flags: review?(rnewman)
Attachment #8851136 - Flags: review?(nalexander)
Comment on attachment 8851136 [details] [diff] [review]
prngfixes.patch

Review of attachment 8851136 [details] [diff] [review]:
-----------------------------------------------------------------

Technically, I think this achieves its aims.  We should discuss the performance implications more widely before landing, IMO.

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/crypto/CryptoInfo.java
@@ +156,5 @@
>      try {
>        byte[] encryptionKey = getKeys().getEncryptionKey();
>        SecretKeySpec spec = new SecretKeySpec(encryptionKey, KEY_ALGORITHM_SPEC);
>  
> +      // Use a shared SecureRandom to generate our IV.

We shouldn't ignore an IV if it's given -- or, if we never do that, we should remove this functionality throughout.  I'd prefer to ```final byte[iv] = getIV() == null || ... ? random() : getIV()``` to minimize this change.

::: mobile/android/base/java/org/mozilla/gecko/GeckoApplication.java
@@ +144,5 @@
>  
>      @Override
>      public void onCreate() {
>          Log.i(LOG_TAG, "zerdatime " + SystemClock.uptimeMillis() + " - Fennec application start");
> +

This should get some comment from sebastian and/or snorp.  Slows down landing, but there's nothing here that can't wait a few days.

nit: s/early enough/before data providers, FxA account creation, and background syncs occur/.

nit: expand the acronyms for PRNG and JCA (which I thought was JC_E_) for the benefit of other users.
Attachment #8851136 - Flags: review?(nalexander) → review+
I've generated those keys on Galaxy S4 running Andrid 4.3. Different outcome is possible, because underlying crypto stuff changed a number of times across versions.
Could you take a look at patch in Comment 35 and comment on the placement of PRNGFixes.apply() call? For some background on its performance impact, see Bug 1173229.

Thanks!
Flags: needinfo?(snorp)
Flags: needinfo?(s.kaspari)
Comment on attachment 8851136 [details] [diff] [review]
prngfixes.patch

Review of attachment 8851136 [details] [diff] [review]:
-----------------------------------------------------------------

Same comment as Nick.

Note that you need to create a version of this for uplift, and it's probably worth eyeballing that, too.
Attachment #8851136 - Flags: review?(rnewman) → review+
To summarize on the risk assessment: as far as we can tell, no user data was stored in cleartext (or poor crypto equivalent) as a result of this issue.

We have 3 variables that matter: the IV, the key and the GUID. Only the IV is known to be zeroes. The key appears to be random. We're not sure about the GUID yet, but assuming it is, no two user will ever have identical records, which greatly reduces the risk of a crypto vulnerability.
> The key appears to be random

IIUC, we're not entirely sure *how* random..?  If we were then we wouldn't be bothering with this PRNGFixes stuff.
(In reply to Julien Vehent [:ulfr] from comment #40)
> To summarize on the risk assessment: as far as we can tell, no user data was
> stored in cleartext (or poor crypto equivalent) as a result of this issue.
> 
> We have 3 variables that matter: the IV, the key and the GUID. Only the IV
> is known to be zeroes. The key appears to be random. We're not sure about
> the GUID yet, but assuming it is, no two user will ever have identical
> records, which greatly reduces the risk of a crypto vulnerability.

No, we expect Android to produce weak crypto/keys, exactly as described in https://android-developers.googleblog.com/2013/08/some-securerandom-thoughts.html.

I just missed this when auditing our use of SecureRandom in https://bugzilla.mozilla.org/show_bug.cgi?id=903907.
Across the production Sync server fleet there are 14024 users with /crypto/keys records that have an A+= IV field.
I've prepared a v1.6.6 release of the server, which blocks writes to the "crypto" collection with all-zero IVs, but allows writes in other requests:

  https://github.com/mozilla-services/server-syncstorage-private/releases/tag/1.6.6
+:seanmonstar, who is my python review buddy for these server changes.
(In reply to Ryan Kelly [:rfkelly] from comment #44)

This has been deployed to stage nodes 2, 3, and 4 (https://sync-[234]-us-east-1.stage.mozaws.net) and passes the updated 1.6.5 API verification tests.
(In reply to Bob Micheletto [:bobm] from comment #46)
> (In reply to Ryan Kelly [:rfkelly] from comment #44)
> 
> This has been deployed to stage nodes 2, 3, and 4
> (https://sync-[234]-us-east-1.stage.mozaws.net) and passes the updated 1.6.5
> API verification tests.

Correction 1.6.6 verification tests.
(In reply to Bob Micheletto [:bobm] from comment #47)
Deployed to production.  
(In reply to Ryan Kelly [:rfkelly] from comment #44)
> I've prepared a v1.6.6 release of the server, which blocks writes to the
> "crypto" collection with all-zero IVs, but allows writes in other requests:
> 
>  
> https://github.com/mozilla-services/server-syncstorage-private/releases/tag/
> 1.6.6

When do you think is the right time to merge this to the public repo?
I'll update the gosync server to block crypto/keys with bad IV's as well.
(In reply to Nick Alexander :nalexander from comment #42)
> No, we expect Android to produce weak crypto/keys, exactly as described in
> https://android-developers.googleblog.com/2013/08/some-securerandom-thoughts.html

To clarify my previous statement: while I agree we should suspect android users generate weak crypto keys, we haven't been able to reproduce it yet. The records we see in database are encrypted with keys we cannot guess, and testing shows keys generated from an impacted device have zero-IVs but random keys. That said, we only tested key generation from a single device, and only tried to decrypt a handful of records, so we should definitely be suspicious, but there's no evidence of an issue.

In comparison, we have reproduced the generation of zero-initialized IVs, which clearly is a security issue, but does not lead to immediate decryption of sync records.

We should absolutely force impacted users to regenerate their /crypto/keys with a solid PRNG, then node-migrate them, but it's not obvious to me that user data is immediately at risk of disclosure.
> it's not obvious to me that user data is immediately at risk of disclosure.

I agree, our worst-case fears did not come to pass here, and we don't have any trivially-decryptable data from these users hanging around on our servers.  There's no evidence of clients using e.g. all-zero keys or similar.  \o/

> while I agree we should suspect android users generate weak crypto keys, we haven't been able to reproduce it yet.
> [...]
> we should definitely be suspicious, but there's no evidence of an issue.

OTOH, I'm not sure what evidence of a low-entropy PRNG would look like in this scenario, or how to assess the risk.  I feel like we should assume they *are* doing low-entropy key generation unless we can find evidence to suggest otherwise.

As an absurd example (and because I was curious) I generated 192k of "random" data using this awesome new PRNG I just invented:


    _prev = None

    def get_random_bytes():
        global _prev
        b = hashlib.md5("%.20f" % time.time()).digest()
        # Careful now, we don't want any duplicates!
        while b == _prev:
            b = hashlib.md5("%.20f" % time.time()).digest()
        _prev = b
        return b


I ran the entropy test [1] on this data and it has good entropy, very close to 8 bits per byte.  But if clients were actually using this then it would be fairly straightforward to guess their encryption keys using information available to the server.

Obviously I don't expect clients to be behaving *that* badly ;-)

But in the summary table at the end of [2], it suggests that on some Android platforms SecureRandom can be seeded with as little as 31 bits of entropy.  If one of our storage node DBs got leaked tomorrow, would keys generated from such a low-entropy source put their data at risk of being cracked in O(months) or O(years) rather than O(never)?  Would they change our risk-assessment for any data disclosed due to Bug 1345742?

[1] http://www.fourmilab.ch/random/
[2] https://www.scribd.com/doc/131955288/Randomly-Failed-The-State-of-Randomness-in-Current-Java-Implementations

> We should absolutely force impacted users to regenerate their /crypto/keys with a solid PRNG

+1.  Relatedly, we should decide whether to leave any ongoing mitigations in place once that migration is complete:

* Do we want to continue blocking /crypto/keys records with all-zero IV?
* Do we want to block /crypto/keys uploads from any Android Firefox that predates the fix, even if they generated an apparently-OK IV?

And if we *don't* want to do those things, should we just disable the /crypto/keys blocking logic right now and let those users go back to syncing correctly?
(In reply to :Grisha Kruglov from comment #38)
> Could you take a look at patch in Comment 35 and comment on the placement of
> PRNGFixes.apply() call? For some background on its performance impact, see
> Bug 1173229.
> 
> Thanks!

This is not great for perf, but I'd rather be secure than fast. Also, API < 18 is only like 10% of our users.
Flags: needinfo?(snorp)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #52)
> (In reply to :Grisha Kruglov from comment #38)
> > Could you take a look at patch in Comment 35 and comment on the placement of
> > PRNGFixes.apply() call? For some background on its performance impact, see
> > Bug 1173229.
> > 
> > Thanks!
> 
> This is not great for perf, but I'd rather be secure than fast. Also, API <
> 18 is only like 10% of our users.

grisha: OK, let's pull the trigger on this.  If sebastian has serious concerns we can back out.
Attached patch prngfixes-updated.patch (obsolete) — Splinter Review
Updated patch for m-c.
Attachment #8851136 - Attachment is obsolete: true
Flags: needinfo?(gkruglov)
Attachment #8851845 - Flags: review?(rnewman)
Attachment #8851845 - Flags: review?(nalexander)
Attached patch prngfixes-updated-aurora.patch (obsolete) — Splinter Review
Rebased patch for aurora. Will request uplift once the m-c one is r+'d.
Attached patch prngfixes-updated-beta.patch (obsolete) — Splinter Review
Rebased patch for beta. Will request uplift once the m-c one is r+'d.
Ah, I didn't see the r+ from earlier. Requesting check-in.
Component: Firefox Sync: Crypto → Android Sync
Keywords: checkin-needed
Product: Cloud Services → Android Background Services
(In reply to Nick Alexander :nalexander from comment #53)
> (In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #52)
> > (In reply to :Grisha Kruglov from comment #38)
> > > Could you take a look at patch in Comment 35 and comment on the placement of
> > > PRNGFixes.apply() call? For some background on its performance impact, see
> > > Bug 1173229.
> > > 
> > > Thanks!
> > 
> > This is not great for perf, but I'd rather be secure than fast. Also, API <
> > 18 is only like 10% of our users.
> 
> grisha: OK, let's pull the trigger on this.  If sebastian has serious
> concerns we can back out.

No serious concerns. Let's land.
Flags: needinfo?(s.kaspari)
[Security approval request comment]
How easily could an exploit be constructed based on the patch?
- Not easily. Patch indicates a possible cryptographic weakness, but there isn't an obvious path to decrypting user data. It's likely that in case of a leak of encrypted user data, decrypting affected data sets becomes easier than we'd like. This issue is made much worse by Bug 1345742, which intermingled user data during upload :-(

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
- This patch might indicate to an observant party that there is a cryptographic issue with Sync and/or FxA for a certain range of affected Android APIs, but it does so in no specific terms. This issue was very widely circulated a few years ago, so I expect people to look out for these kinds of "run cryptography fixes on startup" patches.

Which older supported branches are affected by this flaw?
- no regression point, it was always like this: bug in the OS.

If not all supported branches, which bug introduced the flaw?
- N/A

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
- Yes, created patches for aurora/beta. We should be able to easily uplift this to whatever branch necessary.

How likely is this patch to cause regressions; how much testing does it need?
- Quite unlikely. Patch itself is very simple, primarily moving a bit of code we used to run in a narrow case to run on application startup for devices on API16-18, and it's wrapped in a broad try/catch. The second part, explicitly generating IVs, does not introduce a new code path, and essentially moves around a call to a pseudorandom generator, which is low risk.
Attachment #8851845 - Attachment is obsolete: true
Attachment #8851845 - Flags: review?(rnewman)
Attachment #8851845 - Flags: review?(nalexander)
Flags: needinfo?(abillings)
Attachment #8852135 - Flags: sec-approval?
> This issue is made much worse by Bug 1345742, which intermingled user data during upload

For the record, we have no reason to believe any users encrypted sync records remain exposed due to that bug.  It caused some users encrypted sync records to be written under the wrong user account, but we've since reset the server state to clear them, and I'm not aware of any behaviour that would cause Firefox to cache the records locally.
[Feature/Bug causing the regression]:
OS bug, it was always like this for certain API versions

[User impact if declined]:
For users who are using Sync, and using an old device (API16-18, inclusive range), we might generate weak encryption keys and might use weak - or zero - IVs (initialization vectors) while encrypting records. We also might generate clashing GUIDs for records, which affects all Android users in the API range, but could cause issues in context of Sync.

[Is this code covered by automated tests?]:
Not directly, but affected code paths are involved in tests.

[Has the fix been verified in Nightly?]:
No, just local builds.

[Needs manual test from QE? If yes, steps to reproduce]: 
Low entropy keys are hard (impossible?) to spot on individual basis, as they appear random. However, IVs generated on affected devices are easy to spot: when encoded, they look like "AAAAAAAAAAAAAAAAAAAAAA==". This is easy to spot with additional, manually added logging.

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
Patch itself is very simple, primarily moving a bit of code we used to run in a narrow case to run on application startup for devices on API16-18, and it's wrapped in a broad try/catch. The second part, explicitly generating IVs, does not introduce a new code path, and essentially moves around a call to a pseudorandom generator, which is low risk.

[String changes made/needed]:
None
Attachment #8851847 - Attachment is obsolete: true
Attachment #8852177 - Flags: approval-mozilla-aurora?
Same request as for aurora, but patch is rebased on top of beta.

[Feature/Bug causing the regression]:
OS bug, it was always like this for certain API versions

[User impact if declined]:
For users who are using Sync, and using an old device (API16-18, inclusive range), we might generate weak encryption keys and might use weak - or zero - IVs (initialization vectors) while encrypting records. We also might generate clashing GUIDs for records, which affects all Android users in the API range, but could cause issues in context of Sync.

[Is this code covered by automated tests?]:
Not directly, but affected code paths are involved in tests.

[Has the fix been verified in Nightly?]:
No, just local builds.

[Needs manual test from QE? If yes, steps to reproduce]: 
Low entropy keys are hard (impossible?) to spot on individual basis, as they appear random. However, IVs generated on affected devices are easy to spot: when encoded, they look like "AAAAAAAAAAAAAAAAAAAAAA==". This is easy to spot with additional, manually added logging.

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
Patch itself is very simple, primarily moving a bit of code we used to run in a narrow case to run on application startup for devices on API16-18, and it's wrapped in a broad try/catch. The second part, explicitly generating IVs, does not introduce a new code path, and essentially moves around a call to a pseudorandom generator, which is low risk.

[String changes made/needed]:
None
Attachment #8851848 - Attachment is obsolete: true
Attachment #8852178 - Flags: approval-mozilla-beta?
:grisha, can you confirm how the presence of "AAAAAAAAAAAAAAAAAAAAAA==" in the IV correlates with the presence of the PRNG weakness?

* Do all versions of android on which we need the PRNGFixes, send this all-zero IV?
* Do all versions of android that send this all-zero IV, also need the PRNGFixes?

Conversations in IRC suggested it's a "yes" to both of those, but I'd like to be sure.  If it's "yes" then we can do a really effective job of locking out affected Android clients from upload keys.
Flags: needinfo?(gkruglov)
(In reply to Ryan Kelly [:rfkelly] from comment #63)
> :grisha, can you confirm how the presence of "AAAAAAAAAAAAAAAAAAAAAA==" in
> the IV correlates with the presence of the PRNG weakness?

I think from what we've seen, it's safe to assume that presence of a zero IV indicates PRNG weakness. I don't think think we've seen anything which contradicts that correlation.

However, I don't believe we have evidence to state that the the opposite of this is always true. That is, I don't think that PRNG weakness indicates that an IV will be zero (as opposed to "random-looking").

> If it's "yes" then we can do a really effective job of locking
> out affected Android clients from upload keys.

I think we have enough evidence to prevent clients from uploading keys IF they're also uploading records with zero IVs. I don't think we can be certain that this will cover _all_ affected clients though.
Flags: needinfo?(gkruglov)
Calling this a sec-high on discussions with Dan Veditz. It *may* really be a sec-moderate.

Sec-approval+ for trunk.
Flags: needinfo?(abillings)
Keywords: sec-high
Attachment #8852135 - Flags: sec-approval? → sec-approval+
Why is this bug in the "Mozilla Employee Confidential" group?
(In reply to Al Billings [:abillings] from comment #66)
> Why is this bug in the "Mozilla Employee Confidential" group?

I put that on when I filed it.  In retrospect, perhaps clicking more group restrictions check boxes on a bug doesn't mean lower circulation.
https://hg.mozilla.org/mozilla-central/rev/ac0ce28e3337
https://hg.mozilla.org/mozilla-central/rev/f03ff454812a
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
Comment on attachment 8852178 [details] [diff] [review]
prngfixes-updated-beta.patch

Fix for sec-high issue, let's take it on aurora and beta.
Attachment #8852178 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8852177 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Bob Micheletto [:bobm] from comment #68)
> (In reply to Al Billings [:abillings] from comment #66)
> > Why is this bug in the "Mozilla Employee Confidential" group?
> 
> I put that on when I filed it.  In retrospect, perhaps clicking more group
> restrictions check boxes on a bug doesn't mean lower circulation.

It restricts access to the union of all selected bugzilla groups.
Whiteboard: [adv-main53+]
> > I've prepared a v1.6.6 release of the server, which blocks writes to the
> > "crypto" collection with all-zero IVs, but allows writes in other requests:
>
> When do you think is the right time to merge this to the public repo?

FYI I've now merged this back to the public repo, since the fix is live on the servers.
Alias: CVE-2017-5457
Should we run a new round of migrations for Android users now that 53 is out?
Flags: needinfo?(gkruglov)
53 DAU curve seems flat now, let's migrate.
Flags: needinfo?(gkruglov)
(In reply to :Grisha Kruglov from comment #75)
> 53 DAU curve seems flat now, let's migrate.

Done.
This is RESOLVED/FIXED, should we clear the security flag here?
Flags: needinfo?(gguthe)
Group: cloud-services-security, mozilla-employee-confidential
Flags: needinfo?(gguthe)
Product: Android Background Services → Firefox for Android
You need to log in before you can comment on or make changes to this bug.