Closed
Bug 618068
Opened 14 years ago
Closed 14 years ago
Cipher finalize failed on startup
Categories
(Firefox :: Sync, defect)
Tracking
()
VERIFIED
FIXED
People
(Reporter: rnewman, Assigned: philikon)
References
Details
Attachments
(2 files)
See also Bug 617650.
Log attached.
Reporter | ||
Comment 1•14 years ago
|
||
Info:
* M-C build:
parent: 59027:799ba3224669 tip
Bug 617877: Web console code completion suggestion sticks around when it shouldn't, r+a=gavin
branch: default
commit: (clean)
update: 1 new changesets, 2 branch heads (merge)
mq: 1 unapplied
* fx-sync addon build:
parent: 3177:e4b76e1f74cc tip
Bug 617868 follow-up: fix test
branch: default
commit: 1 modified, 1 unknown
update: (current)
I.e., up-to-date as of about 1pm today.
M-C includes:
changeset: 58968:0f28e112a6b6
user: Kai Engert <kaie@kuix.de>
date: Thu Dec 09 12:04:11 2010 +0100
summary: Bug 599324, land NSPR_4_8_7_BETA1 and NSS_3_12_9_BETA2, a=blocking-2.0-beta8+
`make xpcshell-tests` passes all 89 tests in fx-sync using that M-C build.
Assignee | ||
Comment 2•14 years ago
|
||
To sum up Richard's findings:
* unit tests are working fine, syncing (or logging in) does not.
* it's ok with yesterday's nightlies which didn't have the NSS landing in them
* it breaks with a build from today or today's nightlies
Likely culprit: NSS
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #0)
> See also Bug 617650.
Richard, can you clarify?
Reporter | ||
Comment 5•14 years ago
|
||
(In reply to comment #4)
> Richard, can you clarify?
Sorry, missed this append.
Clarified on IRC: that bug captured my observation of a similar issue... but it didn't occur reproducibly on startup, and didn't occur with the changes I was making at the time. This seems much more pernicious.
Updated•14 years ago
|
tracking-fennec: --- → 2.0b3+
Assignee | ||
Comment 6•14 years ago
|
||
This is notoriously hard to reproduce, but I've just managed it:
Using a regular nightly (with edited WeaveCrypto.js to produce logging output):
1. Change My Sync Key
2. Run into bug 618179 which is only consolable by restarting the browser
3. Trying to outsmart Sync by just Disconnecting and then Connecting again.
Debugging output revealed this:
PK11_AlgtagToMechanism called... PORT_GetError() return value: -5987
PK11_GetPadMechanism called... PORT_GetError() return value: -5987
PK11_ParamFromIV called... PORT_GetError() return value: -5987
PK11_GetInternalKeySlot called... PORT_GetError() return value: -5987
PK11_ImportSymKey called... PORT_GetError() return value: -5987
PK11_CreateContextBySymKey called... PORT_GetError() return value: -5987
PK11_CipherOp return value: 0 PORT_GetError() return value: -5987
PK11_DigestFinal return value: -1 PORT_GetError() return value: -8023
WeaveCrypto: _commonCrypt: failed: [Exception... "cipher finalize failed" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://services-crypto/WeaveCrypto.js :: <TOP_LEVEL> :: line 580" data: no]
The -5987 error code occurs a lot during successful (sic!) crypto operations, so it might be a red herring. -8023 is the actual error.
Assignee | ||
Comment 7•14 years ago
|
||
(In reply to comment #6)
> -8023 is the actual error.
According to secerr.h:
#define SEC_ERROR_BASE (-0x2000)
...
SEC_ERROR_PKCS11_DEVICE_ERROR = (SEC_ERROR_BASE + 169),
-0x2000 + 169 = -8023
I honestly have no idea what that error means.
Comment 8•14 years ago
|
||
Found one way to reproduce this.
Repro:
1) install minefield nightly: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b8pre) Gecko/20101209 Firefox/4.0b8pre
2) sync your account successfully.
3) open prefs, and disconnect your account. then connect again. Check sync log that everything is ok.
4) Open prefs > Generate and Change sync key
5) Check sync log and verify error from bug 618179. skip this and notice the notification pops up saying sync error, with a Sync Now button
6) Click that sync button:
2010-12-09 17:39:29 Service.Main INFO In sync().
2010-12-09 17:39:29 Net.Resource DEBUG GET success 200 https://phx-sync468.services.mozilla.com/1.0/msvdqjdrj2rglmx26it7rmuoqg3wzjzv/info/collections
2010-12-09 17:39:29 Service.Main DEBUG Fetching global metadata record
2010-12-09 17:39:29 Service.Main DEBUG Weave Version: @weave_version@ Local Storage: 4 Remote Storage: 4
2010-12-09 17:39:29 Service.Main INFO Sync key is up-to-date: no need to upgrade.
2010-12-09 17:39:29 Service.Main DEBUG Fetching and verifying -- or generating -- symmetric keys.
2010-12-09 17:39:29 Service.Main INFO Testing info/collections: {"meta":1291945161.03,"crypto":1291945161.98,"clients":1291945108.33,"bookmarks":1291945108.44,"forms":1291944939.98,"history":1291944940.08,"prefs":1291945108.86,"tabs":1291944940.35}
2010-12-09 17:39:29 CollectionKeys INFO Testing for updateNeeded. Last modified: 1291945161.76
2010-12-09 17:39:29 Service.Main INFO CollectionKeys reports that a key update is needed.
2010-12-09 17:39:30 Net.Resource DEBUG GET success 200 https://phx-sync468.services.mozilla.com/1.0/msvdqjdrj2rglmx26it7rmuoqg3wzjzv/storage/crypto/keys
2010-12-09 17:39:30 CollectionKeys INFO Updating collection keys...
2010-12-09 17:39:30 CollectionKeys WARN Got exception "[Exception... "cipher finalize failed" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://services-crypto/WeaveCrypto.js :: <TOP_LEVEL> :: line 557" data: no]" decrypting storage keys with sync key.
2010-12-09 17:39:30 CollectionKeys INFO Aborting updateContents. Rethrowing.
2010-12-09 17:39:30 Service.Main WARN Got exception "[Exception... "cipher finalize failed" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://services-crypto/WeaveCrypto.js :: <TOP_LEVEL> :: line 557" data: no]" fetching cryptoKeys.
2010-12-09 17:39:30 Service.Main WARN Failed to fetch symmetric keys. Failing remote setup.
2010-12-09 17:39:30 Service.Main CONFIG Starting backoff, next sync at:Thu Dec 09 2010 19:05:51 GMT-0800 (PST)
2010-12-09 17:39:30 Service.Main DEBUG Exception: aborting sync, remote setup failed No traceback available
Assignee | ||
Comment 9•14 years ago
|
||
[17:42] bsmith: In pk11err.c, there is PK11_MapError
[17:42] bsmith: and we can see it maps CKR_DEVICE_ERROR to SEC_ERROR_PKCS11_DEVICE_ERROR
[17:43] bsmith: So, in softoken I will look for why it can return CKR_DEVICE_ERROR from NSC_DecyptFinal
[17:43] bsmith: This is decryption, right?
[17:43] philiKON: yes
[17:44] philiKON: note that it looks like Sync might be trigger this after all
[17:44] philiKON: our STR include hitting a bug and it's not entirely clear to me whether Sync then behaves sanely after hitting that bug
[17:45] bsmith: OK.
[17:45] philiKON: rnewman is looking into it right now
[17:45] bsmith: Unforunately, CKR_DEVICE_ERROR is the default error in softoken
[17:45] philiKON: gah
[17:46] bsmith: in pkcs11c.c sftk_MapCryptError maps error codes from freebl to CKR_* codes.
[17:46] bsmith: and the default is CKR_DEVICE_ERROR for "backward compatibility" ?!?
Comment 10•14 years ago
|
||
There's something wacky going on in sync (bug 618179) which is the real blocker here. We should still dig deeper here, but this might be bad input from sync. Still should bulletproof this, but not a blocker for b8 (bug 618179 replaces this one)
blocking2.0: beta8+ → ---
tracking-fennec: 2.0b3+ → ---
Reporter | ||
Comment 11•14 years ago
|
||
Philipp, do you still observe this with my last two fixes? (Useful info to have in this bug.)
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #11)
> Philipp, do you still observe this with my last two fixes? (Useful info to have
> in this bug.)
I can no longer reproduce this as described in comment 6, since bug 618179 was fixed. I do believe there's an underlying issue here, but we definitely have seemed to remove the biggest cause.
Comment 13•14 years ago
|
||
What's the state for users of nightly builds? I see this error using the 20101210 nightly, which should actually include the fix for bug 618179...
Assignee | ||
Comment 14•14 years ago
|
||
I too just ran into this again with a fresh m-c build this morning
1. Created a new account, sync
2. Stop Using This Account
3. Connect to the account again, sync
Comment 15•14 years ago
|
||
I hit that this morning also on 12/10 trunk. but this time i did NOT connect/disconnect nor change Sync key like comment 8. I was simply trying to add a device, and allowing my desktop sync to auto-refresh.
log attached.
Comment 16•14 years ago
|
||
Comment 18•14 years ago
|
||
My suspicion is that one of the inputs to the failing function is bad (e.g. null pointer or buffer too small). I am going to work on getting more helpful diagnostics in NSS and in WeaveCrypto to help pinpoint what is wrong (if anything). Also, I am investigating what might have changed (if anything) in these functions between NSS 3.12.8 and 3.12.9.
Comment 19•14 years ago
|
||
Got this bug on Seamonkey trunk, win XP 32bit.
Immediately after update sync addon from 1.6.b2 to 1.6.b3 got:
2010-12-10 20:12:53 Service.Main INFO In sync().
2010-12-10 20:12:53 Net.Resource DEBUG GET success 200 https://myhost/user/info/collections
2010-12-10 20:12:53 Service.Main DEBUG Fetching global metadata record
2010-12-10 20:12:53 Service.Main DEBUG Weave Version: 1.6b3 Local Storage: 5 Remote Storage: 5
2010-12-10 20:12:53 Service.Main INFO Sync key is up-to-date: no need to upgrade.
2010-12-10 20:12:53 Service.Main DEBUG Fetching and verifying -- or generating -- symmetric keys.
2010-12-10 20:12:53 Service.Main INFO Testing info/collections: {"crypto":1292002340.01,"history":1292002353.31,"meta":1292002338.83,"tabs":1292002353.49}
2010-12-10 20:12:53 CollectionKeys INFO Testing for updateNeeded. Last modified: 1292002339.61
2010-12-10 20:12:53 Service.Main INFO CollectionKeys reports that a key update is needed.
2010-12-10 20:12:53 Net.Resource DEBUG GET success 200 https://myhost/user/storage/crypto/keys
2010-12-10 20:12:53 CollectionKeys INFO Updating collection keys...
2010-12-10 20:12:53 CollectionKeys WARN Got exception "[Exception... "cipher finalize failed" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://services-crypto/WeaveCrypto.js :: <TOP_LEVEL> :: line 557" data: no]" decrypting storage keys with sync key.
2010-12-10 20:12:53 CollectionKeys INFO Aborting updateContents. Rethrowing.
2010-12-10 20:12:53 Service.Main WARN Got exception "[Exception... "cipher finalize failed" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://services-crypto/WeaveCrypto.js :: <TOP_LEVEL> :: line 557" data: no]" fetching cryptoKeys.
2010-12-10 20:12:53 Service.Main WARN Failed to fetch symmetric keys. Failing remote setup.
2010-12-10 20:12:53 Service.Main DEBUG Exception: aborting sync, remote setup failed No traceback available
OS: Mac OS X → All
Comment 20•14 years ago
|
||
I reviewed the changes security/nss in the patch that upgraded NSS to 3.12.9. Literally nothing change in the encryption/decryption code and the other changes seem extremely unlikely to cause changes to AES encryption/decryption. If that landing caused any problems it would most likely be something in NSPR but that seems also very unlikely.
OS: All → Mac OS X
Comment 21•14 years ago
|
||
Wait, I take back comment 20. I do see something that changed in the AES code. I will investigate it further.
Updated•14 years ago
|
tracking-fennec: --- → 2.0b3+
Comment 23•14 years ago
|
||
It looks like the AES changes in NSS 3.12.9 only affect Linux x64 and Solaris x64.
Comment 24•14 years ago
|
||
We are using PKCS#7 padding. That means the plaintext must end in one of these values:
0x01
0x02 0x02
0x03 0x03 0x03
...
0x0F 0x0F .... 0x0F 0x0F
If the data does not end in one of these byte strings then the padding is invalid. The padding could be invalid because (a) we are using a different key than was used to encrypt the data, or (b) the padding is actually incorrect. When the padding is invalid then Softoken *should* return something like CKR_ENCRYPTED_DATA_INVALID. However, the error reporting logic in NSC_DecryptFinal is broken for the specific case of invalid padding and it could return an unpredictable value but most likely the default, CKR_DEVICE_ERROR. I wrote a patch to fix the error reporting from NSS and I suspect that NSS will start returning CKR_ENCRYPTED_DATA_INVALID when we hit this bug with that patch. If so, that will verify that my thinking is correct and that the bug is likely somewhere else. This NSS patch will be useful for diagnosing the problem but it won't be necessary to include this patch in mozilla-central's NSS.
Updated•14 years ago
|
tracking-fennec: 2.0b3+ → ---
Updated•14 years ago
|
tracking-fennec: ? → 2.0b3+
Reporter | ||
Comment 26•14 years ago
|
||
I tried to replicate this using the test_crypto_crypt script, but only got this far:
http://rnewman.pastebin.mozilla.org/883941
:(
Comment 27•14 years ago
|
||
Also, we can verify whether changes in NSS are to blame by simply building trunk and then replacing the NSS DLLs (softokn3, freebl3, nss3, nssutil3) with the DLLs from an earlier nightly. J-PAKE won't work in this configuration but everything else should work exactly the same.
philiKON said that we can reproduce the problem reliably by backing out https://bug618179.bugzilla.mozilla.org/attachment.cgi?id=496715 but I don't know the STR after reversing that patch. Please post the STR to this bug.
Reporter | ||
Comment 28•14 years ago
|
||
I reproduced this by decrypting in a tight loop.
15:36:31 <@rnewman> TEST-UNEXPECTED-FAIL | (xpcshell/head.js) | [Exception... "cipher finalize failed" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: resource://services-crypto/WeaveCrypto.js :: <TOP_LEVEL> :: line 557" data: no]
15:36:57 <@rnewman> http://rnewman.pastebin.mozilla.org/884026
15:37:03 <@rnewman> then run test_crypto_crypt
15:37:30 <@rnewman> that is, simply decrypting the same thing in a tight loop will cause this failure
15:38:00 <@rnewman> in this case, after 1058 iterations succeeded
Comment 29•14 years ago
|
||
Richard, were you able to reproduce this before or only after the memoization patch?
Reporter | ||
Comment 30•14 years ago
|
||
(In reply to comment #29)
> Richard, were you able to reproduce this before or only after the memoization
> patch?
Before. Comment 28 is simply:
* Disable the memoization I added
* Run a decrypt 6,000 times.
It dies reliably for me between 1,000 and 5,000 iterations.
The decryption is just:
function multiple_decrypts(iterations) {
let iv = cryptoSvc.generateRandomIV();
let key = cryptoSvc.generateRandomKey();
let cipherText = cryptoSvc.encrypt("Hello, world.", key, iv);
for (let i = 0; i < iterations; ++i) {
let clearText = cryptoSvc.decrypt(cipherText, key, iv);
do_check_eq(clearText + " " + i, "Hello, world. " + i);
}
}
Note that the inputs to decrypt are not NSS objects, so this could happen in the real world. I don't know why it doesn't.
The memoization I added affects the internals of decrypt (_commonCrypt), and results in this running for 10,000 iterations without the error.
Assignee | ||
Comment 31•14 years ago
|
||
Brian and I just spent a good while reproducing this. We got yesterday's nightly which doesn't have the bugfix for bug 618179 and added debugging output to WeaveCrypto encrypt() and decrypt(). Thanks to bug 618179 we were then able to reliably reproduce the error using the STR as originally described in comment 6:
1. Go to My Sync Key, generate a new one and confirm.
2. Run into bug 618179 which is only consolable by restarting the browser
3. Trying to outsmart Sync by just Disconnecting and then Connecting again.
4. Sync. It fails. Observe cipher finalize failed error.
Changing the Sync Key called WeaveCrypto.encrypt() on the new keys WBO. The Disconnect + Connect step would then call WeaveCrypto.decrypt() on that same data. We verified that whatever WeaveCrypto.encrypt() returned on the first step was passed in to decrypt(). The problem must have therefore been inside WeaveCrypto.
"cipher finalize failed" is thrown when the padding is off. This can happen when the key used is wrong. To look at what the padding would be, we changed WeaveCrypto.decrypt() to retry decryption with the unpadded mechanism if "cipher finalized failed" was encountered. This revealed indeed that the padding was not right. So despite being passed in the right values as parameters, decrypt() ended up using the wrong key.
This was a strong indicator that Richard's SECItem memoization patch from bug 610914 may have been responsible. We disabled the memoization and redid the above steps four times, but couldn't reproduce the error anymore. Given the following evidence, it's pretty clear to me that we need to back out the memoization:
* the parameter values to decrypt() were correct but the cipher operation was using the wrong key
* the random occurrences of the bug outside the reliable STR
* the fact it was reliably reproducible my changing the Sync Key (lack of cache invalidation?)
* but most importantly: the fact that the error could no longer be reproduced after memoization was disabled.
tl;dr: I'm going to back out bug 610914.
Reporter | ||
Comment 32•14 years ago
|
||
(In reply to comment #31)
> tl;dr: I'm going to back out bug 610914.
Good sleuthing!
Assignee | ||
Comment 33•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 34•14 years ago
|
||
(In reply to comment #30)
> (In reply to comment #29)
> * Disable the memoization I added
> * Run a decrypt 6,000 times.
>
> It dies reliably for me between 1,000 and 5,000 iterations.
>
> The memoization I added affects the internals of decrypt (_commonCrypt), and
> results in this running for 10,000 iterations without the error.
Unfortunately, a failure of Pl11_DigestFinal just means that something bad happened that caused its inputs to be bad. I filed a bug to improve the error reporting in the case where the padding is bad (bug 618418).
A possible explanation of this tight-loop behavior is that, without memoization, we run out of memory somewhere and cause a bad (NULL) input to C_DecryptFinal or we create a bad key object because we run out of memory. I will investigate this separately. This would explain why the memoization allowed it to go for more iterations--the memoized version used less memory and there was probably no GC happening during the execution of the loop which means memory never got freed and thus addresses never got reused during the loop. This again points to the problem being the memoization's use memory addresses as part of the memoization key.
Status: RESOLVED → UNCONFIRMED
Ever confirmed: false
Resolution: FIXED → ---
Comment 35•14 years ago
|
||
Sorry, I didn't intend to change the resolution.
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Comment 36•14 years ago
|
||
This doesnt happen anymore on the nightlies that i see. Marking verified on Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0b13pre) Gecko/20110317 Firefox/4.0b13pre
Status: RESOLVED → VERIFIED
Updated•6 years ago
|
Component: Firefox Sync: Crypto → Sync
Product: Cloud Services → Firefox
You need to log in
before you can comment on or make changes to this bug.
Description
•