Last Comment Bug 508259 - Pk11mode crashed on Linux2.4
: Pk11mode crashed on Linux2.4
Status: RESOLVED FIXED
FIPS
:
Product: NSS
Classification: Components
Component: Tools (show other bugs)
: trunk
: x86 Linux
: P1 critical (vote)
: 3.12.4
Assigned To: Christophe Ravel
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-08-04 07:17 PDT by Slavomir Katuscak
Modified: 2009-08-13 14:18 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Defines NO_FORK_CHECK and NO_CHECK_FOR when NSS_NO_FORK_CHECK (checked in) (617 bytes, patch)
2009-08-05 11:58 PDT, Christophe Ravel
julien.pierre: review+
Details | Diff | Splinter Review
Automatically disable fork check on Linux 2.4 kernels (472 bytes, patch)
2009-08-07 20:30 PDT, Julien Pierre
christophe.ravel.bugs: review+
Details | Diff | Splinter Review

Description Slavomir Katuscak 2009-08-04 07:17:14 PDT
Build: securitytip/20090802.1
Platform: Linux 2.4/32bit 

Log:
fips.sh: Run PK11MODE in FIPSMODE  -----------------
pk11mode -d ../fips -p fips- -f ../tests.fipspw
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 

FIPS MODE PKM_Error: Child misbehaved.
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Child return status : 255.
**** Total number of TESTS ran in FIPS MODE is 94. ****
fips.sh: #7144: Run PK11MODE in FIPS mode (pk11mode) . - Core file is detected - FAILED
fips.sh: Run PK11MODE in Non FIPSMODE  -----------------
pk11mode -d ../fips -p nonfips- -f ../tests.fipspw -n
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
NON FIPS MODE PKM_Error: Child misbehaved.
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
Child return status : 255.
**** Total number of TESTS ran in NON FIPS MODE is 92. ****
fips.sh: #7145: Run PK11MODE in Non FIPS mode (pk11mode -n) . - Core file is detected - FAILED

There were also some more crashes like this, all of them in pk11mode.

Core file analysis:
(gdb) where
#0  0x403c7114 in ?? ()
#1  0x40324cc2 in fork () from /lib/tls/libc.so.6
#2  0x402724a4 in fork () from /lib/tls/libpthread.so.0
#3  0x08055327 in PKM_ForkCheck (expected=123, fList=0x0, forkAssert=1, initArgs=0x0) at pk11mode.c:5285
#4  0x0804a051 in main (argc=7, argv=0xffffb044) at pk11mode.c:796

Seems like fork caused crash, I don't know details about circumstances (like too many existing threads), but those failures were found on 2 nightly QA machines (nssamdrhel3, workout) running tests on Linux with 2.4 kernel. On both machines failures occurred only on 32bit build (both DBG and OPT), 64bit build was OK, also machine nssamdrhel4 using 2.6 kernel was OK.

More details - failures occurred on nightly build 20090802. We don't have results from build 20090801 (this day we were running tests on 3.11 branch securityjes5), results from build 20090731 are OK. I inspected all CVS changes in time frame 20090731 - 20090702, and I don't see any change that looks like it can cause this. There were some changes in pk11mode done later, but it's only about version string. 

As Christophe switched nightly QA to 3.12.3.2 branch, we would not have results from 20090703 build, but we need to find out why those tests failed before 3.12.4 is released.
Comment 1 Slavomir Katuscak 2009-08-04 07:19:44 PDT
Those failures were NOT seen on any Tinderbox machine, so it can be machine specific, or it can be something wrong just with this one build.
Comment 2 Slavomir Katuscak 2009-08-04 10:21:58 PDT
More details about OS:

Build machine:
redhat21as: Red Hat Linux Advanced Server release 2.1AS (Pensacola) 

Test machines (using build from redhat21as):
nssamdrhel3: Red Hat Enterprise Linux AS release 3 (Taroon Update 6)
workout: Red Hat Linux Advanced Server release 2.1AS (Pensacola)
Comment 3 Christophe Ravel 2009-08-04 15:48:02 PDT
I looked at the results from the build 20090731.1 and found some errors related to pk11mode too.
Here is output of results.hmtl:

mace[svbld]:/share/builds/mccrel3/security/securitytip/builds/20090731.1/wozzeck_Solaris8/mozilla/tests_results/security/workout.1> grep -i fail results.html 
<TR><TD>#788: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#789: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#3137: OCSP: Verifying certificate(s) OCSPEE15.cert OCSPCA1.cert with flags -g leaf -m ocsp -s failIfNoInfo -d OCSPRootDB -t OCSPRoot </TD><TD bgcolor=lightGreen>Passed</TD><TR>
<TR><TD>#3875: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#3876: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#5448: OCSP: Verifying certificate(s) OCSPEE15.cert OCSPCA1.cert with flags -g leaf -m ocsp -s failIfNoInfo -d OCSPRootDB -t OCSPRoot </TD><TD bgcolor=lightGreen>Passed</TD><TR>
<TR><TD>#5450: Upgrading alicedir </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#5662: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#5663: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#7144: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#7145: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>

from output.log, you can also see that there was a crash:
fips.sh: #788: Run PK11MODE in FIPS mode (pk11mode) . - Core file is detected -
FAILED

I found the same crashed for build 20090730.1, 20090729.1, 20090728.1, 20090727.1 (which was the first build of NSS 3.12.x on Linux 2.4).

So this crash did not happen only on 20090802 but also on all previous builds since we started to build and test on Linux2.4.
Comment 4 Julien Pierre 2009-08-04 15:57:12 PDT
Slavo, Christophe,

Am I correctly understanding that the crashes are only being seen on Linux RHEL 2.1 ?
When I wrote the fork checking code for 3.12 a year ago, I only tested it on the platforms we supported for 3.12 . That did not include RHEL 2.1 .

There are 2 ways that I implemented fork check on Unix platforms - one is using pthread_atfork, and another using PID checks.

On Linux, I decided to use pthread_atfork only, because it was working on RHEL3/RHEL4 which we were supporting for 3.12 .

It looks like pthread_atfork is not working correctly on RHEL2.1 . Normally, I would then use the other method - PID checks. Unfortunately, that method is not likely to work on RHEL 2.1 either, because on some versions of Linux, threads are actually processes and have different PIDs. So the PID check can't be used.

I have questions about the builds you are doing. Did you add a build on RHEL2.1 ? Or did you switch the 32-bit build away from RHEL3 and RHEL4 ?

If it is the former, I can some macro to disable the fork check altogether in the RHEL 2.1 .

If it is the later, I will need to add a runtime check to figure out if we are on a version of Linux on which pthread_atfork is reliable.
Comment 5 Christophe Ravel 2009-08-04 16:21:06 PDT
(In reply to comment #4)
> Slavo, Christophe,
> 
> Am I correctly understanding that the crashes are only being seen on Linux RHEL
> 2.1 ?

We see this crash on both RHEL 2.1 and 3.0

> When I wrote the fork checking code for 3.12 a year ago, I only tested it on
> the platforms we supported for 3.12 . That did not include RHEL 2.1 .
> 
> There are 2 ways that I implemented fork check on Unix platforms - one is using
> pthread_atfork, and another using PID checks.
> 
> On Linux, I decided to use pthread_atfork only, because it was working on
> RHEL3/RHEL4 which we were supporting for 3.12 .
> 
> It looks like pthread_atfork is not working correctly on RHEL2.1 . Normally, I
> would then use the other method - PID checks. Unfortunately, that method is not
> likely to work on RHEL 2.1 either, because on some versions of Linux, threads
> are actually processes and have different PIDs. So the PID check can't be used.
> 
> I have questions about the builds you are doing. Did you add a build on RHEL2.1
> ? Or did you switch the 32-bit build away from RHEL3 and RHEL4 ?

We added a build on RHEL 2.1 to support both RHEL 2.1 and 3.0.

We have our regular build on RHEL 4.0 to support both RHEL 4.0 and 5.0.

> 
> If it is the former, I can some macro to disable the fork check altogether in
> the RHEL 2.1 .

What is the word missing between "I can" and "some macro" ?
If you meant "I can add some macro" that would disable the fork at build time that would be good enough for this old platform (Linux2.4).

> 
> If it is the later, I will need to add a runtime check to figure out if we are
> on a version of Linux on which pthread_atfork is reliable.
Comment 6 Julien Pierre 2009-08-04 16:32:42 PDT
Yes, the missing word was "add". Were we not supporting RHEL 3.0 for 3.12 at all before ? Perhaps I'm misremembering which platforms I tested when I wrote the code.

The macro to define is NO_CHECK_FORK . This will disable the fork check in softoken. You can do this in the build right now. Ideally, I should add conditionals to figure out that the code is being built on RHEL2.1/3.0 and have it disabled automatically, but for now you can set the macro in your build.

The second step will be to modify the tests in pk11mode .
Right now, the tests assume that there is always a fork check on unix platforms, either PID-based or pthread_atfork based. RHEL 2.1/3.0 would be the first platform to not have one. There are two options to modify the test :
1) skip the fork tests in pk11mode . This is done by passing the -n argument to pk11mode
2) run the fork tests anyway, and only look for core files - ignore the results, since forks will not be properly detected.

I think we should do 2) for RHEL2.1/3.0. But again we need some way of detecting that platform.
Comment 7 Nelson Bolyard (seldom reads bugmail) 2009-08-04 16:34:41 PDT
Remember, changes to softoken invoke the evil 4-letter F word!  :)
Comment 8 Julien Pierre 2009-08-04 16:41:48 PDT
Nelson,
Yes, I remember that. But the macros are already there so the fork check can already be disabled by means of build macro. Did we validate FIPS on RHEL 2.1 or 3.0 ? I doubt it, if our tests never passed.
Comment 9 Julien Pierre 2009-08-04 16:52:39 PDT
BTW, we can also disable the fork checks in pk11mode.c by another macro - NO_FORK_CHECK . So, no code change is required there either. How strict is the lab about making changes to this program ?

Setting this macro in pk11mode should be good enough for this build. It looks like I wasn't very consistent with the symbols between the libs and the test program - I think they were changed to review feedback and I forgot to fix them all. They should be fixed to be the same the next time the tree is open for FIPS for consistency.
Comment 10 Slavomir Katuscak 2009-08-05 02:16:43 PDT
(In reply to comment #3)
> I found the same crashed for build 20090730.1, 20090729.1, 20090728.1,
> 20090727.1 (which was the first build of NSS 3.12.x on Linux 2.4).
> 
> So this crash did not happen only on 20090802 but also on all previous builds
> since we started to build and test on Linux2.4.

I see now, results from RHEL 2.1 and 3.0 were not reported in nightly QA before, so those failures were not detected.
Comment 11 Christophe Ravel 2009-08-05 11:58:36 PDT
Created attachment 392768 [details] [diff] [review]
Defines NO_FORK_CHECK and NO_CHECK_FOR when NSS_NO_FORK_CHECK (checked in)
Comment 12 Christophe Ravel 2009-08-05 14:31:54 PDT
Comment on attachment 392768 [details] [diff] [review]
Defines NO_FORK_CHECK and NO_CHECK_FOR when NSS_NO_FORK_CHECK (checked in)

Checking in config.mk;
/cvsroot/mozilla/security/coreconf/config.mk,v  <--  config.mk
new revision: 1.29; previous revision: 1.28
done
Comment 13 Christophe Ravel 2009-08-05 14:36:37 PDT
I tested NSS_NO_FORK_CHECK on Linux2.4 and the pk11mode tests are now passing.

Closing the bug as fixed.
Comment 14 Nelson Bolyard (seldom reads bugmail) 2009-08-05 15:15:24 PDT
Christophe, you committed this change on the trunk. 
Do you also need to commit it on the not-so-mini 3.12.3 branch?
Comment 15 Christophe Ravel 2009-08-05 16:43:50 PDT
Yes I do. I have submitted a patch for that in bug 508108.
Comment 16 Julien Pierre 2009-08-07 20:30:25 PDT
Created attachment 393331 [details] [diff] [review]
Automatically disable fork check on Linux 2.4 kernels

This will allow you to use the same build script for Linux 2.4, without having to pass in your new macro.
Comment 17 Julien Pierre 2009-08-07 20:31:05 PDT
The change suggested by Alexei in https://bugzilla.mozilla.org/show_bug.cgi?id=508108#14 should also go to the trunk
Comment 18 Christophe Ravel 2009-08-10 10:40:27 PDT
Comment on attachment 393331 [details] [diff] [review]
Automatically disable fork check on Linux 2.4 kernels

r=christophe
Comment 19 Julien Pierre 2009-08-10 15:09:03 PDT
Thanks, Christophe.

Checking in Linux2.4.mk;
/cvsroot/mozilla/security/coreconf/Linux2.4.mk,v  <--  Linux2.4.mk
new revision: 1.8; previous revision: 1.7
done
Comment 20 Wan-Teh Chang 2009-08-13 14:18:43 PDT
Please add a comment to Linux2.4.mk to explain why we set
NSS_NO_FORK_CHECK to 1.  For example, "pthread_atfork
doesn't work" or "Under LinuxThreads, threads are process
clones.  Our fork checks require NPTL."

Note You need to log in before you can comment on or make changes to this bug.