Closed
Bug 124446
Opened 23 years ago
Closed 23 years ago
SSL server stress test ends up in infinite loop on Solaris
Categories
(NSS :: Libraries, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
3.4
People
(Reporter: julien.pierre, Assigned: wtc)
Details
Attachments
(2 files)
266.33 KB,
text/plain
|
Details | |
1.23 KB,
patch
|
Details | Diff | Splinter Review |
On Solaris 2.8, running NES with client auth required, the server ended in a
deadlock after 1h30 and 73695 full SSL handshakes.
Reporter | ||
Updated•23 years ago
|
Priority: -- → P1
Target Milestone: --- → 3.4
Reporter | ||
Comment 1•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #68589 -
Attachment mime type: application/octet-stream → text/plain
Comment 2•23 years ago
|
||
The lock that most of the threads are waiting on protects the list of active
tokens in the trust domain. The list is short, and the lock would be pounded
heavily in a stressed environment.
In NSS 3.3, the call would have gone straight to the temp db, followed by the
perm db. This is a new level of locking, more akin to the PK11SlotList from NSS
3.3. That lock was R/W, perhaps this one should be as well?
Assignee | ||
Comment 3•23 years ago
|
||
Ian,
Replacing a normal lock by a reader-writer lock will only
change the performance characteristics and will not solve
the looping or deadlock problems we are seeing.
Assignee | ||
Comment 4•23 years ago
|
||
All the threads are blocked in poll(), pthread_mutex_lock(), or
pthread_cond_wait(), etc. The only thread that is running is
----------------- lwp# 93 / thread# 168 --------------------
feb827e8 PL_HashTableRawLookup (214c90, 551d1300, c2ccc0, 1, 0, c2cd59) + 98
feb832e4 PL_HashTableLookup (214c90, c2ccc0, 80, 0, 0, 0) + 5c
fe608d40 SECOID_FindOID (c2ccc0, ddad48, 22e, 0, 0, f65250) + 90
fe608f5c SECOID_KnownCertExtenOID (c2ccc0, ddabb8, fe658c9c, ddad48, 22e,
ddad44) + 2c
fe5e67f8 cert_HasUnknownCriticalExten (c2cd38, ddabb8, fe658c9c, ddabf4, 84,
ddad48) + 98
fe5e0898 CERT_DecodeDERCertificate (fb360bc4, 1, 0, 0, 0, db9950) + 190
fe61afe0 nssDecodedPKIXCertificate_Create (0, 873d18, 0, 4, 6b, 874078) + 78
fe617e3c nssDecodedCert_Create (0, 873d18, 1, 8, fffffff8, 843526) + 4c
fe60e650 nssCertificate_GetDecoding (873cf0, fb360d20, 8, 8432f8, 9c3f8, 9c428)
+ 50
fe61f8e4 get_token_cert (9c3c0, 9c3f8, f4262204, 0, 9c3f8, 9c428) + 304
fe6200cc retrieve_cert (9c3c0, 9c3f8, f4262204, fb361058, 0, 8d9ce9) + 1d4
fe61f3bc traverse_objects_by_template (9c3c0, 0, fb360fc0, 3, fe61fef8,
fb361058) + 354
fe620700 nssToken_TraverseCertificatesBySubject (9c3c0, 0, 531f40, fb361058, 0,
0) + 258
fe614440 NSSTrustDomain_FindBestCertificateBySubject (9c240, 531f40, ec7730,
fb3611c0, 0, 9aa0c8) + 128
fe60ec68 NSSCertificate_BuildChain (531f10, ec7730, fb3611c0, 0, fb3611b4, 2) + 2b8
fe59d228 CERT_FindCertIssuer (846a88, 39975, ad54463e, 0, 0, 5f4881) + e8
fe59dc64 CERT_VerifyCertChain (9c240, 846a88, 1, 0, 39975, ad54463e) + 52c
fe59f070 CERT_VerifyCert (39975, ad54463e, 1, 0, 39975, ad54463e) + 610
fe59f290 CERT_VerifyCertNow (39975, ad54463e, 1, 0, 0, 73cdd0) + 78
fe73231c SSL_AuthCertificate (9c240, 599eb8, 1, 1, 1, 93b968) + 104
fe72c8d0 ssl3_HandleCertificate (ddee48, 102730c, 26a, a91a95, 0, fe819f58) + 728
fe72e854 ssl3_HandleHandshakeMessage (ddee48, 102730c, 26a, 0, 8a, 1027308) + 63c
fe72ef28 ssl3_HandleHandshake (ddee48, a5971c, 0, ffffffff, fb3616b4, 1027308)
+ 2e0
fe72faa4 ssl3_HandleRecord (1027308, 37a, fb3616c4, fffffff8, 28, 821408) + 874
fe7315d4 ssl3_GatherCompleteHandshake (ddee48, 0, 46, ffffffff, fffffff8,
8bd6b8) + 10c
fe735f68 ssl_GatherRecord1stHandshake (ddee48, a8, 20, ffffffff, fffffff8,
8bd6b8) + 100
fe741c00 ssl_Do1stHandshake (ddee48, a8, e2aa00, 4, 0, ddef40) + 340
fe744570 ssl_SecureRecv (ddee48, d84760, 1fff, 0, 0, 745ed1) + 230
fe74cbc4 ssl_Recv (599eb8, d84760, 1fff, 0, 1e8480, 19) + 12c
fea9f118 PR_Recv (599eb8, d84760, 1fff, 0, 1e8480, 0) + 68
ff1c1bd8 __0fNDaemonSessionNGetConnectionv (c9eb78, 0, 1, 1, fe83c7a0, 0) + 4a0
ff1c1e48 __0fNDaemonSessionDrunv (c9eb78, 99, 98, 98, 0, 0) + f8
ff045c10 __0fGThreadErun_v (c9eb78, a8, 24f10, fe8cfb60, 10, 697d09) + 60
ff045b8c ThreadMain (c9eb78, 4, fe8ce000, 4, c9f070, 0) + 34
feadcb34 _pt_root (c9f070, fc633d18, 0, 5, 1, fe401000) + 1a4
fe8bbc08 _thread_start (c9f070, 0, 0, 0, 0, 0) + 40
Comment 5•23 years ago
|
||
I can reproduce this every time I run the test. One time it happened after 5
minutes, another time took 45 minutes.
I'm using a build with NSPR from /s/b/c. Since I don't have the source, it's
hard to see what is going on there. But the running thread is definitely stuck
in PL_HashTableRawLookup. It bounces back and forth between the same two lines
of code.
http://lxr.mozilla.org/mozilla/source/nsprpub/lib/ds/plhash.c#185
What I notice is that he == *hep == he->next. Thus the while loop will run
infinitely. It seems that the OID hashtable is corrupt, but I don't know
how/why yet.
Comment 6•23 years ago
|
||
After debugging with Wan-Teh, we decided that the OID hashtable needs to use
PL_HashTableLookupConst instead of PL_HashTableLookup. This is because
PL_HashTableLookup may change the contents of the hashtable, and the OID
hashtable is not threadsafe.
The reason this occurs in 3.4 is that prior versions of NSS used DBM for the OID
hash.
I found an additional bug by reviewing secoid.c. Patch coming.
Comment 7•23 years ago
|
||
Comment 8•23 years ago
|
||
patch was checked in as rev 1.11 of secoid.c. Bug 124923 was opened about
providing a lock for the dynamic OID hash.
I have run selfserv on solaris under the same conditions that caused a hang
three out of three times, in about 20 minutes, 5 minutes, and 45 minutes. It
has now been running for three hours without any problem.
Marking fixed. Julien, you may wish to verify with NES.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•