Closed
Bug 564011
Opened 15 years ago
Closed 15 years ago
SQLITE3 backend has object ID race on keygen (even in single thread).
Categories
(NSS :: Libraries, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
3.12.7
People
(Reporter: sysadmin, Assigned: rrelyea)
References
(Blocks 1 open bug)
Details
(Whiteboard: FIPS)
Attachments
(5 files, 3 obsolete files)
4.65 KB,
text/plain
|
Details | |
5.59 KB,
text/plain
|
Details | |
5.49 KB,
text/plain
|
Details | |
3.58 KB,
patch
|
rrelyea
:
review+
|
Details | Diff | Splinter Review |
2.22 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; it; rv:1.9.1.9) Gecko/20100317 SeaMonkey/2.0.4
Build Identifier: 3.12.6
Calling C_GenerateKeyPair multiple times (and also with different key types!) returns the *same* object handle! Half of the generated keys are lost/overwritten. Calling the function simultaneously by multiple threads creates a real disaster.
All works perfectly with the old BDB backend.
Reproducible: Always
Steps to Reproduce:
Run the attached test program, that generates 30 RSA keys and 30 EC keys.
Actual Results:
With SQLITE3 backend:
%./a.out /usr/local/lib/nss/libsoftokn3.so configdir='sql:.' 12345678
RSA -1795042673 -90903455
RSA -1164645278 -90903455
RSA -1164645278 -90903453
RSA -1164645276 -90903453
RSA -1164645276 -90903451
RSA -1164645274 -90903451
RSA -1164645274 -90903449
RSA -1164645272 -90903449
RSA -1164645272 -90903447
ECC -1164645270 -90903447
RSA -1164645268 -90903445
RSA -1164645268 -90903443
RSA -1164645266 -90903443
RSA -1164645266 -90903441
RSA -1164645264 -90903441
RSA -1164645264 -90903439
RSA -1164645262 -90903439
ECC C_GenerateKeyPair 48
ECC -1164645262 -90903437
ECC -1164645260 -90903437
ECC -1164645260 -90903435
ECC -1164645258 -90903435
RSA -1164645258 -90903433
RSA -1164645256 -90903433
RSA -1164645256 -90903431
RSA -1164645254 -90903431
RSA -1164645254 -90903429
ECC -1164645252 -90903429
RSA -1164645252 -90903427
ECC -1164645250 -90903427
ECC -1164645250 -90903425
ECC -1164645248 -90903425
ECC -1164645248 -90903423
ECC -1164645246 -90903423
ECC -1164645246 -90903421
ECC -1164645244 -90903421
ECC -1164645244 -90903419
ECC -1164645242 -90903419
ECC -1164645242 -90903417
ECC -1164645240 -90903417
ECC -1164645240 -90903415
ECC -1164645238 -90903415
ECC -1164645238 -90903413
ECC -1164645236 -90903413
ECC -1164645236 -90903411
ECC -1164645234 -90903411
ECC -1164645234 -90903409
ECC -1164645232 -90903409
ECC -1164645232 -90903407
ECC -1164645230 -90903407
ECC -1164645230 -90903405
ECC -1164645228 -90903405
RSA -1164645228 -90903403
RSA -1164645226 -90903403
RSA -1164645226 -90903401
RSA -1164645224 -90903401
RSA -1164645224 -90903399
RSA -1164645222 -90903399
RSA -1164645222 -90903397
RSA -1164645220 -90903397
Expected Results:
With BDB backend:
%./a.out /usr/local/lib/nss/libsoftokn3.so configdir='.' 12345678
RSA -1766937837 -827413741
RSA -1832945932 -893421836
RSA -1862048249 -922524153
RSA -1852166566 -912642470
RSA -1771687628 -832163532
RSA -1859749385 -920225289
RSA -1797889382 -858365286
ECC -1816936244 -877412148
ECC -1875615269 -936091173
RSA -1836008136 -896484040
ECC -1778997149 -839473053
ECC -1771380607 -831856511
RSA -1874015336 -934491240
ECC -1852978087 -913453991
ECC -1814593665 -875069569
ECC -1840635597 -901111501
RSA -1763881890 -824357794
ECC -1841696958 -902172862
ECC -1818447760 -878923664
RSA -1780874406 -841350310
ECC -1811901401 -872377305
RSA -1814322973 -874798877
ECC -1779343239 -839819143
ECC -1867467544 -927943448
RSA -1759277805 -819753709
ECC -1877920776 -938396680
ECC -1744845065 -805320969
ECC -1867408352 -927884256
RSA -1847740256 -908216160
ECC -1876261781 -936737685
ECC -1760721411 -821197315
ECC -1821192545 -881668449
ECC -1854008737 -914484641
ECC -1872495743 -932971647
RSA -1862994671 -923470575
ECC -1846173488 -906649392
ECC -1842191382 -902667286
RSA -1745104177 -805580081
ECC -1804836613 -865312517
ECC -1773628712 -834104616
ECC -1769863224 -830339128
RSA -1838952310 -899428214
ECC -1768950977 -829426881
ECC -1775645016 -836120920
RSA -1808191676 -868667580
ECC -1840679408 -901155312
ECC -1750369936 -810845840
ECC -1874893127 -935369031
RSA -1760258976 -820734880
RSA -1795309529 -855785433
RSA -1826079071 -886554975
RSA -1847844862 -908320766
RSA -1761326732 -821802636
RSA -1792705593 -853181497
RSA -1810165621 -870641525
RSA -1748114513 -808590417
RSA -1760181230 -820657134
RSA -1757935365 -818411269
RSA -1786141411 -846617315
RSA -1821550964 -882026868
Reporter | ||
Comment 1•15 years ago
|
||
Here is the testcase. One thread creates 30 RSA keypairs and another one 30 EC keypairs.
Updated•15 years ago
|
Updated•15 years ago
|
Attachment #443695 -
Attachment mime type: application/octet-stream → text/plain
Assignee | ||
Comment 2•15 years ago
|
||
updating subject to be more descriptive.
Summary: SQLITE3 backend completely broken! → SQLITE3 backend has object ID race on keygen.
Assignee | ||
Comment 3•15 years ago
|
||
To be clear, the problem Alex has found is not that the SQLITE DB is creating objects sequentially, but that the SQLITE DB is creating objects with the same object ID.
I also see that when this happens an object ID is skipped.
bob
Reporter | ||
Comment 4•15 years ago
|
||
To be more precise: the issue is present even in single-thread. Using multiple threads is worse, because the function may also fails with strange errors or returns inconsistent data.
Summary: SQLITE3 backend has object ID race on keygen. → SQLITE3 backend has object ID race on keygen (even in single thread).
Comment 5•15 years ago
|
||
If it happens with a single thread, it's not a race.
Reporter | ||
Comment 6•15 years ago
|
||
You called it a race ;-) You can launch only one of the two threads and the behaviour is the same. I don't know if fixing the single-thread object ID reuse will fix also the rest (function occasionally failing in multi-thread and bad parallelism).
Assignee | ||
Comment 7•15 years ago
|
||
OK there is clearly a race here.
sdb_getObjectID() is supposed to return a unique objectID that is not yet used. It cycles through a list of object ID's it has, making sure it's not already created. There are logs around getting the object id and creating the object, but those locks are only active if SQLITE_UNSAFE_THREADS is set. I don't believe it's set for modern implementations of SQLITE. Turning that one (you sent set SQLITE_UNSAFE_THREADS in your environment and rebuilt to get that).
The problem, however is not this race. In sql databases each object has a unique lower set of bits id. The upper 2 bits are set to indicate if the object is a TOKEN object (always "1" in this case) and if the object should be in the key or cert DB ("0" = key "1"= cert). Public keys are stored in the "cert" db (which is really the public objects db).
running alex's test with some changes which output the object ID as a hex value rather than a decimal I see:
RSA i=0 0xa925c6ac 0xe925c6ab
RSA i=1 0xa925c6ac 0xe925c6ab
RSA i=2 0xa925c6ae 0xe925c6ad
RSA i=3 0xa925c6ae 0xe925c6ad
RSA i=4 0xa925c6b0 0xe925c6af
RSA i=5 0xa925c6b0 0xe925c6af
RSA i=6 0xa925c6b2 0xe925c6b1
RSA i=7 0xa925c6b2 0xe925c6b1
RSA i=8 0xa925c6b4 0xe925c6b3
RSA i=9 0xa925c6b4 0xe925c6b3
RSA i=10 0xa925c6b6 0xe925c6b5
RSA i=11 0xa925c6b6 0xe925c6b5
Clearly we are getting unique values for the public and private keys. I've turned threads off and put a 30 second sleep between keygens without having an effect on the outcome, so clearly the above race is not the issue. I suspect it's probably our code that is trying to detect import of an identical object into the token and turning a create into an update that is causing us grief here.
bob
Assignee | ||
Comment 8•15 years ago
|
||
OK, it is a problem of the code meant to handle common updates. The CKA_ID for these keys are set to NULL. This means every keygen will match any other keygen. The old DB does not have this issue because the oldDB automatically sets CKA_ID (because it has to, it can't store a different value for CKA_ID).
NSS does not show this problem because the NSS keygen function sets the CKA_ID values immediately after the key gen. NOTE: NSS would be susceptible to a race here, where 2 different keygen's are running at the same time, the first generates the key, task switches, and the second goes to generate a key. Both keygen will return identical handles to identical keys in this case.
patch included.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Assignee | ||
Comment 9•15 years ago
|
||
Assignee: nobody → rrelyea
Reporter | ||
Comment 10•15 years ago
|
||
I tried the patch and modified the testcase to spawn additional threads (I'm testing it on a 8 cores machine).
These are the new result (200 EC P-256 keys and 50 RSA 1024 keys with 4 threads):
/usr/bin/time ./a.out /usr/local/lib/nss/libsoftokn3.so configdir='.' 12345678
...
8.18 real 14.27 user 0.18 sys
%/usr/bin/time ./a.out /usr/local/lib/nss/libsoftokn3.so configdir='sql:.' 12345678
...
RSA (1) C_GenerateKeyPair 5
...
RSA (3) C_GenerateKeyPair 5
...
RSA (1) C_GenerateKeyPair 5
...
RSA (3) C_GenerateKeyPair 5
...
50.25 real 14.03 user 0.64 sys
In other words:
1) the IDs are not reused anymore (OK)
2) occasionally the function returns CKR_GENERAL_ERROR (BAD): before this happens, all the threads are blocked. I think (supposition) the sqlite db is locked and no threads can successfully create new keys. After about 10 seconds (the sleep/retry cycle), one keypairgen returns CKR_GENERAL_ERROR and all threads restart to work until the condition happens again
So, the user time is quite the same, but the real time is much higher (about 40 seconds for the 4 keypairgen that failed).
Reporter | ||
Comment 11•15 years ago
|
||
The updated testcase, that spawns 4 threads and prints also the thread number.
Attachment #443695 -
Attachment is obsolete: true
Reporter | ||
Comment 12•15 years ago
|
||
This is the backtrace of the four blocked threads. As I supposed, they are all waiting on db mutex. I hope this will help you finding the bug.
Assignee | ||
Comment 13•15 years ago
|
||
Alex, let's make a separate bug for your deadlock issue. It's clearly different from your previous issue.
It looks very much like a deadlock deep inside sqlite itself, so we probably should push the issue upstream to the sqlite maintainer.
Some questions, however: Which version of sqlite are you using? Mozilla's, NSS's or one from the OS? If the middle, it might just be a matter of updating to the latest sqlite.
bob
Assignee | ||
Comment 14•15 years ago
|
||
Also, was this built with SQLITE_UNSAFE_THREADS set?
bob
Reporter | ||
Comment 15•15 years ago
|
||
External, latest release, 3.6.23.1, built thread-safe.
(SQLITE_UNSAFE_THREADS doesn't exist anymore)
Assignee | ||
Comment 16•15 years ago
|
||
It does in NSS...
If you build NSS with that define, it will do thread locking before calling sqlite.
bob
Reporter | ||
Comment 17•15 years ago
|
||
Ops, misunderstood. Rebuilt NSS with that define. The result is worse, much more (and sequentially) CKR_GENERAL_ERROR failures.
286.84 real 22.29 user 1.06 sys
Reporter | ||
Comment 18•15 years ago
|
||
New backtrace with SQLITE_UNSAFE_THREAD enabled.
Comment 19•15 years ago
|
||
Robert Relyea asked in comment 13:
> Which version of sqlite are you using? Mozilla's, NSS's or one from the OS?
In Comment 15, Alex Dupre answered:
> External, latest release, 3.6.23.1, built thread-safe.
> (SQLITE_UNSAFE_THREADS doesn't exist anymore)
I believe that makes this bug invalid. NSS tests, qualifies and supports
its releases against certain specific versions of SQLITE3. If it fails with
THOSE versions, that's a bug for which the NSS team will take responsiblity.
If it fails with some other version, that's a failure of an unsupported
combination. You know the old story: "Doctor, it hurts when I do this.
Well, then, don't do that!"
Reporter | ||
Comment 20•15 years ago
|
||
That doesn't make the bug invalid, because the behaviour is *identical* with the bundled sqlite.
Reporter | ||
Comment 21•15 years ago
|
||
The race condition seems so:
- one thread has a lock on the sqlite db and is trying to acquire the monitor
- a second thread has acquired the monitor and is trying to write to the sqlite db
This is a stall condition. After the second thread timeouts and fails, the first one continue to do its job.
Comment 22•15 years ago
|
||
(I'm an intern at Google working on Chrome.)
We ran into these two problems doing some stress-testing of keygen's thread-safety. The CKA_ID patch in comment #9 does fix the first issue for us as well. For the deadlock, it appears the problem is in sftk_signTemplate. Specifically, the block
if (handle->type != SFTK_KEYDB_TYPE) {
keyHandle = handle->peerDB;
}
when we switch over to the peerDB, the resulting writes are not done with a transaction open. This breaks what I can see of sdb.c's locking assumptions. (The net result is that we write to the sqlReadDB handle and act quite oddly.)
The attached patch opens (and cleans up) a transaction if necessary in sftk_signTemplate. I have not been able to reproduce the stall with this enabled.
Since it does mess around with transactions, the code is somewhat frail in places. In particular, I'm worried about future callers to sftk_signTemplate that may already have a transaction on the other database open. Also, should a similar dance be necessary in future, one which takes the transactions in the opposite order could introduce another stall.
More importantly, if the certDB transaction later fails, the changes to the signature will not be rolled back. I'm not sure that can be fixed easily, seeing as they live in different databases.
Updated•15 years ago
|
Attachment #452156 -
Flags: review?(rrelyea)
Comment 23•15 years ago
|
||
Bob:
David Benjamin works with me this summer. He ran into this bug
independently and spent two days tracking this down. Please
review his patch while he remembers the details clearly. Thank
you!
Comment 24•15 years ago
|
||
Comment on attachment 452156 [details] [diff] [review]
When signing attributes to a peer db, remember to create a transaction
r=wtc.
Please change "Peer" to "PeerDB" in "usingPeer" and "inPeerTransaction".
Attachment #452156 -
Flags: review+
Comment 25•15 years ago
|
||
I renamed the variables as I suggested in comment 24, and checked davidben's patch in
on the NSS trunk (Softoken 3.12.7) and the SOFTOKEN_3_13_BRANCH (Softoken 3.13).
Checking in sftkdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sftkdb.c,v <-- sftkdb.c
new revision: 1.25; previous revision: 1.24
done
Checking in sftkdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sftkdb.c,v <-- sftkdb.c
new revision: 1.24.4.2; previous revision: 1.24.4.1
done
Attachment #452156 -
Attachment is obsolete: true
Attachment #453578 -
Flags: review?(rrelyea)
Attachment #452156 -
Flags: review?(rrelyea)
Comment 26•15 years ago
|
||
r=wtc. I edited the comment in sftkdb_lookupObject to note that it is
the caller rather than that function that will create the object.
I think CKR_OBJECT_HANDLE_INVALID is not an appropriate return code
for sftkdb_getFindTemplate because that function doesn't deal with object
handles. CKR_ATTRIBUTE_VALUE_INVALID seems better, because the
CKA_ID attribute in ptemplate is empty. Bob, what do you think?
Since this return code is only used by sftkdb_getFindTemplate to report
this special condition to its caller, sftkdb_lookupObject, as long as this
return code is not used under any other condition, it doesn't matter what
the return code is. Right now, sftkdb_getFindTemplate returns CKR_OK
or CKR_TEMPLATE_INCOMPLETE under other conditions.
I checked in this patch on the NSS trunk (Softoken 3.12.7) and
SOFTOKEN_3_13_BRANCH (Softoken 3.13).
Checking in sftkdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sftkdb.c,v <-- sftkdb.c
new revision: 1.26; previous revision: 1.25
done
Checking in sftkdb.c;
/cvsroot/mozilla/security/nss/lib/softoken/sftkdb.c,v <-- sftkdb.c
new revision: 1.24.4.3; previous revision: 1.24.4.2
done
Attachment #443752 -
Attachment is obsolete: true
Comment 27•15 years ago
|
||
Alex, it would be nice if you could verify the two patches in this bug
fixed the bug for you. Thanks.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: 3.13 → 3.12.7
Reporter | ||
Comment 28•15 years ago
|
||
I'll verify it soon, in the next days.
Reporter | ||
Comment 29•15 years ago
|
||
The testcase now passes successfully. The bug is fixed. I currently cannot say if the patch introduces other deadlocks elsewhere or not.
Comment 30•15 years ago
|
||
Alex, thanks a lot for verifying the fix.
David Benjamin has been reviewing the NSS softoken's locking
of sqlite and sqlite's internal locking in the past couple of
days. I encouraged him to write up and publish his findings
as an NSS technical note.
Status: RESOLVED → VERIFIED
Assignee | ||
Comment 31•15 years ago
|
||
Comment on attachment 453578 [details] [diff] [review]
When signing attributes to a peer db, remember to create a transaction, by David Benjamin (checked in)
r+ rrelyea
Good catch!
Attachment #453578 -
Flags: review?(rrelyea) → review+
You need to log in
before you can comment on or make changes to this bug.
Description
•