Closed Bug 508259 Opened 15 years ago Closed 15 years ago

Pk11mode crashed on Linux2.4

Categories

(NSS :: Tools, defect, P1)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.4

People

(Reporter: slavomir.katuscak+mozilla, Assigned: christophe.ravel.bugs)

Details

(Whiteboard: FIPS)

Attachments

(2 files)

Build: securitytip/20090802.1
Platform: Linux 2.4/32bit 

Log:
fips.sh: Run PK11MODE in FIPSMODE  -----------------
pk11mode -d ../fips -p fips- -f ../tests.fipspw
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 

FIPS MODE PKM_Error: Child misbehaved.
Loaded FC_GetFunctionList for FIPS MODE; slotID 0 
Child return status : 255.
**** Total number of TESTS ran in FIPS MODE is 94. ****
fips.sh: #7144: Run PK11MODE in FIPS mode (pk11mode) . - Core file is detected - FAILED
fips.sh: Run PK11MODE in Non FIPSMODE  -----------------
pk11mode -d ../fips -p nonfips- -f ../tests.fipspw -n
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
NON FIPS MODE PKM_Error: Child misbehaved.
loaded C_GetFunctionList for NON FIPS MODE; slotID 1 
Child return status : 255.
**** Total number of TESTS ran in NON FIPS MODE is 92. ****
fips.sh: #7145: Run PK11MODE in Non FIPS mode (pk11mode -n) . - Core file is detected - FAILED

There were also some more crashes like this, all of them in pk11mode.

Core file analysis:
(gdb) where
#0  0x403c7114 in ?? ()
#1  0x40324cc2 in fork () from /lib/tls/libc.so.6
#2  0x402724a4 in fork () from /lib/tls/libpthread.so.0
#3  0x08055327 in PKM_ForkCheck (expected=123, fList=0x0, forkAssert=1, initArgs=0x0) at pk11mode.c:5285
#4  0x0804a051 in main (argc=7, argv=0xffffb044) at pk11mode.c:796

Seems like fork caused crash, I don't know details about circumstances (like too many existing threads), but those failures were found on 2 nightly QA machines (nssamdrhel3, workout) running tests on Linux with 2.4 kernel. On both machines failures occurred only on 32bit build (both DBG and OPT), 64bit build was OK, also machine nssamdrhel4 using 2.6 kernel was OK.

More details - failures occurred on nightly build 20090802. We don't have results from build 20090801 (this day we were running tests on 3.11 branch securityjes5), results from build 20090731 are OK. I inspected all CVS changes in time frame 20090731 - 20090702, and I don't see any change that looks like it can cause this. There were some changes in pk11mode done later, but it's only about version string. 

As Christophe switched nightly QA to 3.12.3.2 branch, we would not have results from 20090703 build, but we need to find out why those tests failed before 3.12.4 is released.
Those failures were NOT seen on any Tinderbox machine, so it can be machine specific, or it can be something wrong just with this one build.
More details about OS:

Build machine:
redhat21as: Red Hat Linux Advanced Server release 2.1AS (Pensacola) 

Test machines (using build from redhat21as):
nssamdrhel3: Red Hat Enterprise Linux AS release 3 (Taroon Update 6)
workout: Red Hat Linux Advanced Server release 2.1AS (Pensacola)
Assignee: nelson → julien.pierre.boogz
I looked at the results from the build 20090731.1 and found some errors related to pk11mode too.
Here is output of results.hmtl:

mace[svbld]:/share/builds/mccrel3/security/securitytip/builds/20090731.1/wozzeck_Solaris8/mozilla/tests_results/security/workout.1> grep -i fail results.html 
<TR><TD>#788: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#789: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#3137: OCSP: Verifying certificate(s) OCSPEE15.cert OCSPCA1.cert with flags -g leaf -m ocsp -s failIfNoInfo -d OCSPRootDB -t OCSPRoot </TD><TD bgcolor=lightGreen>Passed</TD><TR>
<TR><TD>#3875: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#3876: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#5448: OCSP: Verifying certificate(s) OCSPEE15.cert OCSPCA1.cert with flags -g leaf -m ocsp -s failIfNoInfo -d OCSPRootDB -t OCSPRoot </TD><TD bgcolor=lightGreen>Passed</TD><TR>
<TR><TD>#5450: Upgrading alicedir </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#5662: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#5663: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#7144: Run PK11MODE in FIPS mode (pk11mode) . </TD><TD bgcolor=red>Failed Core</TD><TR>
<TR><TD>#7145: Run PK11MODE in Non FIPS mode (pk11mode -n) . </TD><TD bgcolor=red>Failed Core</TD><TR>

from output.log, you can also see that there was a crash:
fips.sh: #788: Run PK11MODE in FIPS mode (pk11mode) . - Core file is detected -
FAILED

I found the same crashed for build 20090730.1, 20090729.1, 20090728.1, 20090727.1 (which was the first build of NSS 3.12.x on Linux 2.4).

So this crash did not happen only on 20090802 but also on all previous builds since we started to build and test on Linux2.4.
Slavo, Christophe,

Am I correctly understanding that the crashes are only being seen on Linux RHEL 2.1 ?
When I wrote the fork checking code for 3.12 a year ago, I only tested it on the platforms we supported for 3.12 . That did not include RHEL 2.1 .

There are 2 ways that I implemented fork check on Unix platforms - one is using pthread_atfork, and another using PID checks.

On Linux, I decided to use pthread_atfork only, because it was working on RHEL3/RHEL4 which we were supporting for 3.12 .

It looks like pthread_atfork is not working correctly on RHEL2.1 . Normally, I would then use the other method - PID checks. Unfortunately, that method is not likely to work on RHEL 2.1 either, because on some versions of Linux, threads are actually processes and have different PIDs. So the PID check can't be used.

I have questions about the builds you are doing. Did you add a build on RHEL2.1 ? Or did you switch the 32-bit build away from RHEL3 and RHEL4 ?

If it is the former, I can some macro to disable the fork check altogether in the RHEL 2.1 .

If it is the later, I will need to add a runtime check to figure out if we are on a version of Linux on which pthread_atfork is reliable.
(In reply to comment #4)
> Slavo, Christophe,
> 
> Am I correctly understanding that the crashes are only being seen on Linux RHEL
> 2.1 ?

We see this crash on both RHEL 2.1 and 3.0

> When I wrote the fork checking code for 3.12 a year ago, I only tested it on
> the platforms we supported for 3.12 . That did not include RHEL 2.1 .
> 
> There are 2 ways that I implemented fork check on Unix platforms - one is using
> pthread_atfork, and another using PID checks.
> 
> On Linux, I decided to use pthread_atfork only, because it was working on
> RHEL3/RHEL4 which we were supporting for 3.12 .
> 
> It looks like pthread_atfork is not working correctly on RHEL2.1 . Normally, I
> would then use the other method - PID checks. Unfortunately, that method is not
> likely to work on RHEL 2.1 either, because on some versions of Linux, threads
> are actually processes and have different PIDs. So the PID check can't be used.
> 
> I have questions about the builds you are doing. Did you add a build on RHEL2.1
> ? Or did you switch the 32-bit build away from RHEL3 and RHEL4 ?

We added a build on RHEL 2.1 to support both RHEL 2.1 and 3.0.

We have our regular build on RHEL 4.0 to support both RHEL 4.0 and 5.0.

> 
> If it is the former, I can some macro to disable the fork check altogether in
> the RHEL 2.1 .

What is the word missing between "I can" and "some macro" ?
If you meant "I can add some macro" that would disable the fork at build time that would be good enough for this old platform (Linux2.4).

> 
> If it is the later, I will need to add a runtime check to figure out if we are
> on a version of Linux on which pthread_atfork is reliable.
Yes, the missing word was "add". Were we not supporting RHEL 3.0 for 3.12 at all before ? Perhaps I'm misremembering which platforms I tested when I wrote the code.

The macro to define is NO_CHECK_FORK . This will disable the fork check in softoken. You can do this in the build right now. Ideally, I should add conditionals to figure out that the code is being built on RHEL2.1/3.0 and have it disabled automatically, but for now you can set the macro in your build.

The second step will be to modify the tests in pk11mode .
Right now, the tests assume that there is always a fork check on unix platforms, either PID-based or pthread_atfork based. RHEL 2.1/3.0 would be the first platform to not have one. There are two options to modify the test :
1) skip the fork tests in pk11mode . This is done by passing the -n argument to pk11mode
2) run the fork tests anyway, and only look for core files - ignore the results, since forks will not be properly detected.

I think we should do 2) for RHEL2.1/3.0. But again we need some way of detecting that platform.
Remember, changes to softoken invoke the evil 4-letter F word!  :)
Whiteboard: FIPS
Nelson,
Yes, I remember that. But the macros are already there so the fork check can already be disabled by means of build macro. Did we validate FIPS on RHEL 2.1 or 3.0 ? I doubt it, if our tests never passed.
BTW, we can also disable the fork checks in pk11mode.c by another macro - NO_FORK_CHECK . So, no code change is required there either. How strict is the lab about making changes to this program ?

Setting this macro in pk11mode should be good enough for this build. It looks like I wasn't very consistent with the symbols between the libs and the test program - I think they were changed to review feedback and I forgot to fix them all. They should be fixed to be the same the next time the tree is open for FIPS for consistency.
(In reply to comment #3)
> I found the same crashed for build 20090730.1, 20090729.1, 20090728.1,
> 20090727.1 (which was the first build of NSS 3.12.x on Linux 2.4).
> 
> So this crash did not happen only on 20090802 but also on all previous builds
> since we started to build and test on Linux2.4.

I see now, results from RHEL 2.1 and 3.0 were not reported in nightly QA before, so those failures were not detected.
Assignee: julien.pierre.boogz → christophe.ravel.bugs
Status: NEW → ASSIGNED
Attachment #392768 - Flags: review?(julien.pierre.boogz)
Attachment #392768 - Flags: review?(julien.pierre.boogz) → review+
Summary: Pk11mode crashed on Linux. → Pk11mode crashed on Linux2.4
Comment on attachment 392768 [details] [diff] [review]
Defines NO_FORK_CHECK and NO_CHECK_FOR when NSS_NO_FORK_CHECK (checked in)

Checking in config.mk;
/cvsroot/mozilla/security/coreconf/config.mk,v  <--  config.mk
new revision: 1.29; previous revision: 1.28
done
Attachment #392768 - Attachment description: Defines NO_FORK_CHECK and NO_CHECK_FOR when NSS_NO_FORK_CHECK → Defines NO_FORK_CHECK and NO_CHECK_FOR when NSS_NO_FORK_CHECK (checked in)
I tested NSS_NO_FORK_CHECK on Linux2.4 and the pk11mode tests are now passing.

Closing the bug as fixed.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Christophe, you committed this change on the trunk. 
Do you also need to commit it on the not-so-mini 3.12.3 branch?
Yes I do. I have submitted a patch for that in bug 508108.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This will allow you to use the same build script for Linux 2.4, without having to pass in your new macro.
Attachment #393331 - Flags: review?(christophe.ravel.bugs)
The change suggested by Alexei in https://bugzilla.mozilla.org/show_bug.cgi?id=508108#14 should also go to the trunk
Comment on attachment 393331 [details] [diff] [review]
Automatically disable fork check on Linux 2.4 kernels

r=christophe
Attachment #393331 - Flags: review?(christophe.ravel.bugs) → review+
Thanks, Christophe.

Checking in Linux2.4.mk;
/cvsroot/mozilla/security/coreconf/Linux2.4.mk,v  <--  Linux2.4.mk
new revision: 1.8; previous revision: 1.7
done
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Please add a comment to Linux2.4.mk to explain why we set
NSS_NO_FORK_CHECK to 1.  For example, "pthread_atfork
doesn't work" or "Under LinuxThreads, threads are process
clones.  Our fork checks require NPTL."
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: