Closed Bug 892040 Opened 12 years ago Closed 12 years ago

Key stretching benchmarking for PICL auth

Categories

(Android Background Services Graveyard :: Crypto, defect)

All
Android
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: rnewman, Assigned: rnewman)

References

Details

(Whiteboard: [qa-])

Our initial key-stretching plan[1] is to do 20k rounds of PBKDF2-HMAC-SHA256, followed by scrypt at N/r/p=64k/8/1, then a second 20k of PBKDF2. scrypt at that setting takes about 60MB of RAM. On a 1.8GHz Atom, I estimate that should take 130ms+1.3s+130ms= 1.6s . On a Raspberry Pi (which might be comparable to a mid-end phone) it should take 7.5s, which is at the far end of my usability target. This only needs to be done once per device. The bulk of the stretching value comes from the scrypt. The PBKDF is just there to protect clients that outsource the scrypt phase to a helper (which we haven't defined or written yet). Security is obviously better when we don't use the helper, so I'd like the client to measure itself at runtime and decide whether it's capable of doing the scrypt itself. That's a question of speed (the client can run a few milliseconds of the underlying Salsa20 operation and then estimate how long the whole thing would take, and bail if it's more than like 10 seconds). It's also a question of memory, for which maybe the client can just try to allocate the whole thing and then fall back to the helper if the malloc fails. Do you know if memory-allocation failures on android are survivable? On unix, it'd be pretty painful (you're more likely to go into swap thrash than to get a failure from malloc). So maybe a better criteria would be to ask the OS about the total memory size, and use the helper unless it's greater than like 512MB. Anyways, the speed-testing app I have in mind would launch, run 20k rounds of PBKDF2-HMAC-SHA256 and report the time, run a small scrypt loop (maybe N/r/p = 1k/8/1, which should only use 1MB) and report the time, then exit. Maybe have it POST the results to a server somewhere (along with device-identifying values and whatever /proc/cpuinfo we can get). Then we run it on all the devices we care about, collect the data, build a spreadsheet, and choose some settings that will work well-enough everywhere.
Component: Android Sync → Crypto
Input to this: * Android enforces a 24MB limit for each process (16MB on some devices, 32MB on others, 48MB if you're really lucky). If scrypt requires 60MB, we can't do it in Java. * You can get around that by using the NDK to allocate memory in C. Even on devices for which scrypt isn't too expensive, we'd need to drop down to C to do it.
Additionally: I misremembered that we did PBKDF2 already in Android Sync. We don't; that was used for the earlier key-stretching version of Sync, which we abandoned before the Android rewrite. We *do* use RFC 5869 HKDF, so not a terrifying leap.
Working PBKDF2 code (SHA-1). https://github.com/mozilla-services/android-sync/tree/rnewman/pbkdf2 From informal testing, 20,000 iterations of PBKDF2-with-SHA1 takes 2.7 seconds on my top-of-the-line dual-core HTC One. More tomorrow.
A direct implementation of PBKDF2-SHA256 -- a la http://stackoverflow.com/questions/9147463/java-pbkdf2-with-hmacsha256-as-the-prf -- takes a ridiculous 166 seconds to perform 40,000 iterations. It's likely that this can be optimized further (I haven't yet), and it doesn't take as much advantage of the pre-bundled Java crypto code as the SHA-1 version, which is basically a single API call. Regardless, this might be an indication that this all needs to happen in native code (e.g., by linking to libnss). More tomorrow.
The hashing itself should be pretty fast. I suspect that the slowdown is resulting from a whole bunch of context switching, if the individual SHA256() calls are going through some kind of overblown crypto-manager API that pretends to be a smartcard or all happens in kernelspace. I bet it'd be a lot faster if SHA256() were just a library call (keeping everything in userspace) or if the whole PBKDF2-SHA256 operation were a single API call (keeping everything in that other space).
spongycastle 1.47 and up seems to bundle PBKDF2+SHA256, so that's my next avenue.
spongycastle's implementation takes 98 seconds for 40,000 iterations. The exact same spongycastle implementation, but using SHA-1, is only very slightly slower than my previous implementation of PBKDF2HMACSHA1 using javax.crypto, so there's no obvious misuse of the SC API. SHA-256 just seems to be very expensive in this environment. Given that SHA-1's weaknesses don't really impact its use in this situation, one wonders whether either SHA-256 is unnecessary here, or whether we need to do that many rounds of SHA-256. But I'll continue with my benchmarking…
Some reading suggests that the SHA-1 version probably uses OpenSSL under the hood, which would explain the speed disparity.
With OpenSSL, same hardware finishes 80,000 iterations of PBKDF2-SHA-256 in 1.4 seconds, which is hilarious. I need to write more tests to make sure this is correct.
Target Milestone: --- → 1.0
I've requisitioned an ARMv6 device so I can do low-end benchmarking. These two devices alone should give us a good solution space.
The OpenSSL variant passes the ridiculous 16,777,216-iteration test in 152 seconds.
Complete and working code is here: https://github.com/mozilla-services/android-sync/pull/330 Will verify on ARMv6 device when it arrives.
Whiteboard: [qa-]
No longer blocks: 892025
Relevant: https://wiki.mozilla.org/Identity/AttachedServices/Key_Stretching_Performance_Tests Note that on my HTC One we can do the full key stretch -- 20k PBKDF2, 65k scrypt, 20k PBKDF2 -- in about 7.3 seconds.
Blocks: 915312
Benchmarking done. Resolving.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Fine by me.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.