Closed Bug 389568 Opened 17 years ago Closed 17 years ago

SSL stress tests fail for shared DB on Solaris.

Categories

(NSS :: Libraries, defect, P2)

3.12
x86
SunOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rrelyea, Assigned: rrelyea)

References

Details

Attachments

(1 file)

Turning on SSL stress with shared DBs on Solaris causes the client auth SSL stress tests fail.

Historical notes: Other platforms (like linux) failed the stress tests until a thread safe version of sqlite was compiled.  Currently solaris is compiling sqlite with threed safe turned on.
Blocks: 217538
OK, this turns out not to  be a race in sqlite. It's a 'run out of file descriptors' problem.

sqlite cannot share database handles between threads. To handle this sdb.c opens a database as needed on each operation. If all enough threads find themselves in an sdb operation it's possible to run out of file descriptors (solaris has max file descriptors set to 256, Linux seems to top out at 1000 some). 

Possible solutions:
1. reduce the threads on stress test.
2. increase the file descriptors count on solaris.
3. detect the "out of descriptor" conditions and wait for the file descriptors
to free up in sdb.c.
4. change the way db descriptors are used.
5. fix sqlite3.
6. I'm open to suggestions because 1-5 all sound non-optimal to me..

At this point 3 looks like the best short term solution.

bob
Bob, good find.  Within NSS, how does this error manifest itself? 
What NSS operations fail, and what NSS error codes do they return?

(embarassing admission) I don't know how to increase the per-process limit on
file descriptors in Solaris, but if it's easy, I'd suggest that as the short 
term fix. 
Nelson,

Re: comment 2, how to do that depends on your shell . Typically, it is something like unlimit, or ulimit. You can usually check the current values with limit . There is also a maximum per-user that can be configured separately. Root is typically unlimited (well, 4gb file descs) on Solaris.
Nelson,

It manifests itself in a couple of ways, typically a failure to sign (err is an invalid key for operation). The error PKCS #11 is returning is CKR_NETSCAPE_BAD_KEY_DB. The primary error is lost to us immediately in SQLITE (which returns simply SQLITE_CANTOPEN).

Julien,

I'm running Solaris 10 x86 (under vmware, but the latter shouldn't matter for this). ulimit -a shows a file descriptor limit of 256, setting ulimit -n 512 doesn't seem to do anything. Same with a root shell.;(.

bob
In reply to comment 4: Bob, I meant programmatically.  
We don't want browser users to need to change shell scripts.
Bob,

That's odd. I installed s10 x86 myself and this is what I got. The box is using NIS, but I don't think that matters for the root user. I am using tcsh as the shell for both myself and root .

If you want to raise the per-user limits, I believe you need to use the Solaris   Management Console (SMC). Run as root and type smc . Make sure your xhost and DISPLAY are set properly. Yes, it is a GUI progrma. Java even. Sigh.

[jp96085@monstre]/home/jp96085 1 % limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       10240 kbytes
coredumpsize    unlimited
vmemoryuse      unlimited
descriptors     65536
[jp96085@monstre]/home/jp96085 2 % unlimit
[jp96085@monstre]/home/jp96085 3 % limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       unlimited
coredumpsize    unlimited
vmemoryuse      unlimited
descriptors     65536
[jp96085@monstre]/home/jp96085 4 % su -
Password:
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
# limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       unlimited
coredumpsize    unlimited
vmemoryuse      unlimited
descriptors     65536
# unlimit
# limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       unlimited
coredumpsize    unlimited
vmemoryuse      unlimited
descriptors     unlimited
#

There is a programmatic way to do what the shell does, however to raise the per-user limit I believe you have to be root, which many people will not be. If there is a significant increase in file descriptor requirements, that will be a problem for many customers, especially server users. It sounds like it would affect client auth servers the most. We should find a way to get around this problem.
Aha! it was my shell.
I was trying to run ulimit in my tcsh!
 ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        10240
coredump(blocks)     unlimited
nofiles(descriptors) 256
vmemory(kbytes)      unlimited
solaris1(220) limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       10240 kbytes
coredumpsize    unlimited
vmemoryuse      unlimited
descriptors     256
solaris1(221) unlimit
solaris1(222) limit
cputime         unlimited
filesize        unlimited
datasize        unlimited
stacksize       unlimited
coredumpsize    unlimited
vmemoryuse      unlimited
descriptors     65536
solaris1(223) ulimit -a
time(seconds)        unlimited
file(blocks)         unlimited
data(kbytes)         unlimited
stack(kbytes)        unlimited
coredump(blocks)     unlimited
nofiles(descriptors) 65536
vmemory(kbytes)      unlimited
solaris1(224)
So it doesn't look like it's selfserv that's running into the limits, it's strsclnt. changing my shell ulimit caused ssl.sh to complete successfully.
So, is this bug invalid, then?
Well the bug is that the tests fail on tinderbox, which means we need to either bump the limits on tinderbox or have the script bump the limits
This patch contains 2 fixes:

1) in all.sh 2&>1 was fixed to be 2>&1. The former parsed out as 2 & (launch in background) > 1 (redirect to file '1') instead of the intended latter (redirect FD2 into FD1) This would sometimes cause the key removal to run before key generation, leaving an extra key in the FIPS db which caused the FIPS test to fail later.

2) in ssl.sh increase the FD limit.

bob
Attachment #273987 - Flags: review?
Comment on attachment 273987 [details] [diff] [review]
patch to make solaris x86 successfully run shared db tests.

good catch, Bob
Attachment #273987 - Flags: review? → review+
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Priority: -- → P2
Target Milestone: --- → 3.12
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: