Closed Bug 443357 Opened 16 years ago Closed 2 years ago

Assertion failure in create_sproc while running shlibsign during firefox build on Irix

Categories

(NSPR :: NSPR, defect)

SGI
IRIX
defect
Not set
major

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jimis, Unassigned)

Details

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9) Gecko/2008061712 Fedora/3.0-1.fc9 Firefox/3.0
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9) Gecko/2008061712 Fedora/3.0-1.fc9 Firefox/3.0

It happens every time I try to build firefox 3.0rc1 on SGI Irix, with --enable-debug option (--without-pthreads is also on, I don't know if it's relevant). The build gets interrupted by the following assertion failure, triggered from the shlibsign utility:

gmake[6]: Leaving directory `/tmp/firefox-3.0rc1/mozilla/security/nss/cmd/shlibsign/mangle'
cd /tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/nss ; sh /tmp/firefox-3.0rc1/mozilla/security/nss/cmd/shlibsign/./sign.sh /tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/dist \
/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/nss IRIX \
/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/dist/lib /tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/dist/lib/libsoftokn3.so
/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/nss/shlibsign -v -i /tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/dist/lib/libsoftokn3.so
Assertion failure: rv == 1, at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:687
/tmp/firefox-3.0rc1/mozilla/security/nss/cmd/shlibsign/./sign.sh[56]: 374282 Abort(coredump)
gmake[5]: *** [/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/dist/lib/libsoftokn3.chk] Error 134
gmake[5]: Leaving directory `/tmp/firefox-3.0rc1/mozilla/security/nss/cmd/shlibsign'
gmake[4]: *** [libs] Error 2
gmake[4]: Leaving directory `/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/security/manager'
gmake[3]: *** [libs_tier_toolkit] Error 2
gmake[3]: Leaving directory `/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5'
gmake[2]: *** [tier_toolkit] Error 2
gmake[2]: Leaving directory `/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5'
gmake[1]: *** [default] Error 2
gmake[1]: Leaving directory `/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5'
gmake: *** [build] Error 2

I am marking this bug as of Critical Importance, because it obstructs completely from building a debug build. I need the debug build to track down other errors on this platform.

Reproducible: Always
OS: Other → IRIX
Hardware: Other → SGI
Version: unspecified → 3.0 Branch
Here is an attempt to catch the backtrace with gdb. Is it normal that threads are involved even though the build is --without-pthreads?


(gdb) run
Starting program: /tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/nss/shlibsign
Assertion failure: rv == 1, at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:687

Program received signal SIGABRT, Aborted.
0x0fa4ac28 in _prctl () at /xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/proc/prctl.s:15
15      /xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/proc/prctl.s: No such file or directory.
        in /xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/proc/prctl.s
Current language:  auto; currently asm
(gdb) bt
#0  0x0fa4ac28 in _prctl () at /xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/proc/prctl.s:15
#1  0x0c0800cc in pthread_kill () at sig.c:149
#2  0x0c0810e0 in _SGIPT_libc_raise () at sig.c:660
#3  0x0fad1f44 in _raise () at raise.c:26
#4  0x0fa6f6b8 in abort () at abort.c:52
#5  0x04328a80 in PR_Assert (s=0x438c3c0 "rv == 1",
    file=0x438c190 "/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c", ln=687)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/io/prlog.c:577
#6  0x04328a80 in PR_Assert (s=0x438c3c0 "rv == 1",
    file=0x438c190 "/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c", ln=687)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/io/prlog.c:577
Previous frame identical to this frame (corrupt stack?)
And here is the backtrace with the dbx debugger of Irix. It manages to catch a little more, but still the full backtrace is lost...


>  0 _prctl(0x15, 0x4, 0xffff, 0x0, 0x0, 0x14548, 0x2, 0x0) ["/xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/proc/prctl.s":15, 0xfa4ac28]
   1 pthread_kill(0x0, 0x6, 0x8000, 0x0, 0x0, 0x14548, 0x2, 0x0) ["/xlv41/6.5.30m/work/eoe/lib/libpthread/libpthread_n32_M3/sig.c":150, 0xc0800c4]
   2 _SGIPT_libc_raise(0x0, 0x4, 0xffff, 0x0, 0x0, 0x14548, 0x2, 0x0) ["/xlv41/6.5.30m/work/eoe/lib/libpthread/libpthread_n32_M3/sig.c":660, 0xc0810d8]
   3 _raise(0x15, 0x4, 0xffff, 0x0, 0x0, 0x14548, 0x2, 0x0) ["/xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/signal/raise.c":26, 0xfad1f3c]
   4 abort(0x15, 0x4, 0xffff, 0x0, 0x0, 0x14548, 0x2, 0x0) ["/xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/gen/abort.c":52, 0xfa6f6b0]
   5 PR_Assert(s = 0x438c3c0 = "rv == 1", file = 0x438c190 = "/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c", ln = 687) ["/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/io/prlog.c":577, 0x4328a78]
   6 pthread_kill(0x0, 0x7fff2f54, 0x7f, 0x0, 0x0, 0x14548, 0x2, 0x0) ["/xlv41/6.5.30m/work/eoe/lib/libpthread/libpthread_n32_M3/sig.c":150, 0xc0800c4]
   7 _MD_CreateThread(thread = 0x438c3c0, start = 0x438c190, priority = 687, scope = PR_LOCAL_THREAD=0, state = PR_JOINABLE_THREAD=0, stackSize = 70949832) ["/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c":761, 0x4378d04]
   8 pthread_kill(0x10000, 0x0, 0x43a1bc8, 0x0, 0x0, 0x14548, 0x2, 0x0) ["/xlv41/6.5.30m/work/eoe/lib/libpthread/libpthread_n32_M3/sig.c":150, 0xc0800c4]
   9 <Unknown>() [< unknown >, 0x7fff2f5c]
Assignee: nobody → nobody
Component: General → Build
Product: Firefox → NSS
QA Contact: general → build
Version: 3.0 Branch → unspecified
Looks like the stack is pretty trashed.  
I'd guess that a trashed stack leads to a wild jump into NSPR code.

At a minimum, we need info about the version of IRIX and the system on 
which the build was being done.  But I think no NSPR developers have 
access to any IRIX systems any more, so we'll need someone to debug this 
who has an IRIX system.  It should be pretty easy to rerun the failing 
shlibsign command in the debugger.  

With respect to the issue of building FF on IRIX, you should be able to  
get around this by replacing the shlibsign executable with a copy of 
/bin/true and continuing the build.  I think that makes this bug not 
critical.
Assignee: nobody → wtc
Component: Build → NSPR
Product: NSS → NSPR
QA Contact: build → nspr
Summary: Assertion failure for shlibsign while building firefox on Irix → Assertion failure in create_sproc while running shlibsign during firefox build on Irix
Version: unspecified → 4.7
Hi and thanks for the help.

Irix is version 6.5.30, as can be seen in the backtraces. The system is a IP32 processor SGI O2, and the source is lightly changed at various spots to surpass other problems. For example various Makefiles are patched to enforce a -mips4 build instead of the -mips3. However nothing relevant to this has changed. 

I can help you debug, but I don't know what more to do since these backtraces are from within the debuggers. The binary crashes immediately after running, even with no option, so to get the backtraces I just did:

LD_LIBRARY_PATH=/tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/dist/lib:/usr/nekoware/lib $DEBUGGER /tmp/firefox-3.0rc1/obj-mips-sgi-irix6.5/nss/shlibsign

The examination of the core file was completely useless with gdb, but dbx (the Irix debugger) gives the same backtrace. 

What does this utility do and it's so easy to just replace it with /bin/true? I think I will just comment out the specific Assert and continue the debug build. 

Finally, why do I get a pthread related backtrace even if I have configured --without-pthreads? Is it sure that in other UNIXes the compination --without-pthreads and --enable-debug builds fine?

Thanks again for the help.
Severity: critical → major
By commenting out the assertion I got a "Memory fault" coredump, and by reproducing the crash in gdb, I fortunately get something more sane than before. The crash message is: 

/tmp/firefox-3.0rc1/mozilla/security/nss/cmd/shlibsign/./sign.sh[56]: 382250 Memory fault(coredump)

And the gdb backtrace:

#0  0x0436b840 in _MD_InitCPUS () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/unix.c:2022
#1  0x04359c98 in _PR_InitCPUs () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/threads/combined/prucpu.c:120
#2  0x0433cacc in _PR_InitStuff () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/misc/prinit.c:222
#3  0x0433cbb8 in _PR_ImplicitInitialization () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/misc/prinit.c:251
#4  0x04353614 in PR_GetSpecialFD (osfd=PR_StandardError)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/io/prio.c:107
#5  0x10005944 in usage (program_name=0x7fff3043 "shlibsign") at shlibsign.c:71
#6  0x10005f04 in main (argc=1, argv=0x7fff2f44) at shlibsign.c:209
shlibsign, the SHared LIBrary SIGNer, generates a file containing the digital 
signature of a shared library.  The generated digital signature file is 
required if and when the cryptographic shared libraries in FF3 are going to 
be used in "FIPS mode", that is, in the manner prescribed by Federal 
Information Processing Standard (FIPS) 140-2.  If you use IRIX's requickstart
command (does IRIX still have that?), which modifies the shared libraries,
you have to rerun shlibsign to generate new signatures on the rewritten 
shared libs, or else FF3 won't run in "FIPS mode".

But if you're not using FF in "FIPS mode", then the signature file generated 
by shlibsign is not used at all.  That's why you can get away with skipping this step.  

I'll add, though, that if shlibsign fails, then odds are good that the 
browser (when finally built and run) will likely fail in a similar way.  

I think it's likely that no-one has built FF --without-pthreads in a LONG
time.  The code in which the assertion failure is occurring appears to be 
code that is only generated when built --without-pthreads.  I would expect
that if you remove the assertion, you would experience a hang or crash.  
Comment on attachment 328037 [details]
I attach the full backtrace together with some useful code listing and variable values.

The assertion reported in this attachment is not the same one previously reported in this bug.
It refers to the crash that happened after I commented out the assertion. Please read comment #5 for more info on that. Unfortunately I couldn't get any more info from the initially reported crash, the stack is garbled... However I think the last backtrace is more useful.
I am also pasting here some warnings that the compiler spits while building irix.c. Perhaps all this information leads us somewhere...

/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c: In function `_MD_GetSP':
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:468: warning: cast to pointer from integer of different size
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c: In function `_MD_InitLocks':
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:481: warning: suggest parentheses around assignment used as truth value
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:484: warning: suggest parentheses around assignment used as truth value
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c: In function `exit':
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:1023: warning: unused variable `thr'
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:1024: warning: unused variable `qp'
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:1112: warning: `noreturn' function does return
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c: In function `_MD_EarlyInit':
/tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:1431: warning: suggest parentheses around assignment used as truth value

I should also mention that I tried building with /bin/true in the place of shlibsigh, but the build fails because it doesn't find the libraries' .chk files it needs. I could create a script that would touch such a file, but I think it would be better to hunt down that bug.
Go ahead and touch the .chk files.  

I expect my colleague and NSPR module owner Wan-Teh Chang to come along 
and say in this bug that building --without-pthreads is no longer supported.  
But maybe it is still supported.  We'll see.
OK, I will just add some information that I managed to get. I added the following lines just before the offending assert() call and now the backtrace that gdb gives is fine. 


Lines added:
                if (rv < 0) {
                        perror(NULL);
                        abort();
                }

Message perror() returns:
Bad file number

gdb backtrace: 
#0  0x0fa4ac28 in _prctl () at /xlv41/6.5.30m/work/irix/lib/libc/libc_n32_M4/proc/prctl.s:15
#1  0x0c0800cc in pthread_kill () at sig.c:149
#2  0x0c0810e0 in _SGIPT_libc_raise () at sig.c:660
#3  0x0fad1f44 in _raise () at raise.c:26
#4  0x0fa6f6b8 in abort () at abort.c:52
#5  0x0437884c in create_sproc (entry=0x4361c04 <_PR_NativeRunThread>, inh=127, arg=0x10039400, sp=0x0, len=65536, pid=0x7fff25b4)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:689
#6  0x04378dc0 in _MD_CreateThread (thread=0x10039400, start=0x4361c04 <_PR_NativeRunThread>, priority=PR_PRIORITY_NORMAL, 
    scope=PR_GLOBAL_THREAD, state=PR_UNJOINABLE_THREAD, stackSize=65536)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/irix.c:774
#7  0x04364ef4 in _PR_NativeCreateThread (type=PR_SYSTEM_THREAD, start=0x435a23c <_PR_RunCPU>, arg=0x1003a000, 
    priority=PR_PRIORITY_NORMAL, scope=PR_GLOBAL_THREAD, state=PR_UNJOINABLE_THREAD, stackSize=65536, flags=128)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/threads/combined/pruthr.c:1093
#8  0x04365340 in _PR_CreateThread (type=PR_SYSTEM_THREAD, start=0x435a23c <_PR_RunCPU>, arg=0x1003a000, priority=PR_PRIORITY_NORMAL, 
    scope=PR_GLOBAL_THREAD, state=PR_UNJOINABLE_THREAD, stackSize=65536, flags=128)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/threads/combined/pruthr.c:1211
#9  0x04359c8c in _PR_InitCPUs () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/threads/combined/prucpu.c:109
#10 0x0433cacc in _PR_InitStuff () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/misc/prinit.c:222
#11 0x0433cbb8 in _PR_ImplicitInitialization () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/misc/prinit.c:251
#12 0x04353614 in PR_GetSpecialFD (osfd=PR_StandardError)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/io/prio.c:107
#13 0x10005944 in usage (program_name=0x7fff3043 "shlibsign") at shlibsign.c:71
#14 0x10005f04 in main (argc=1, argv=0x7fff2f44) at shlibsign.c:209
The addresses are different because I reran the binary. Sorry for the spamming but I think this is the most important one.
OK I think I pinpointed the specific problem. _pr_irix_primoridal_cpu_fd has a starting value of {-1,-1}, and should be set by the pipe() call inside _MD_IrixInit() (file irix.c). After setting breakpoints I am sure that this function never gets called.
Dimitios, You're making great progress.  Let me suggest that you look for
a place to insert that _MD_IrixInit call.  Maybe there's some place where
there are other MD_*Init function calls.  I'd expect that to be in 
PR_InitStuff or one of the functions it calls.  

May I ask why you're avoiding pthreads?
I have already tried building with pthreads, but I had various other crashes that I can't be sure they were related to pthreads. However I found a previous working build of firefox2 for SGI Irix, and inside mozconfig it had --without-pthreads. So I thought that for firefox3 it would be better to leave it that way...

I found the following clues for _MD_IrixInit:
nsprpub/pr/include/md/_irix.h:#define _MD_FINAL_INIT _MD_IrixInit

Now _MD_FINAL_INIT is only (supposed to be) called inside prinit.c (and *only* there, it's not being called anywhere else). Perhaps you can have a look and tell me if this code ever gets executed:
nsprpub/pr/src/misc/prinit.c:    _PR_MD_FINAL_INIT();

It is also exported with other name via primpl.h:
nsprpub/pr/include/private/primpl.h:NSPR_API(void) _PR_MD_FINAL_INIT(void);
nsprpub/pr/include/private/primpl.h:#define    _PR_MD_FINAL_INIT _MD_FINAL_INIT

but I can't find _MD_FINAL_INIT being called *anywhere* from within nsprpub/ directory. I don't have a clue how this initialisation is supposed to happen. Should it happen from within libnspr4? Or should it be called from the user of the library (in our case shlibsign)?


Attached patch prinit.c.patchSplinter Review
After applying the attached patch (moving _PR_InitCPUs() after _PR_MD_FINAL_INIT() so thath _MD_IrixInit() gets executed) the build passes the point where it used to fail, only to fail at a later spot: 

#0  0x0436b840 in _MD_InitCPUS () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/unix.c:2022
#1  0x04359c98 in _PR_InitCPUs () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/threads/combined/prucpu.c:120
#2  0x0433cb6c in _PR_InitStuff () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/misc/prinit.c:245
#3  0x0433cbb8 in _PR_ImplicitInitialization () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/misc/prinit.c:251
#4  0x04353614 in PR_GetSpecialFD (osfd=PR_StandardError)
    at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/io/prio.c:107
#5  0x10007654 in usage (program_name=0x7fff3043 "shlibsign") at shlibsign.c:71
#6  0x10007c14 in main (argc=1, argv=0x7fff2f44) at shlibsign.c:209


Perhaps I was wrong to move the _PR_InitCPUs call, So now it segfaults! However I can't find any other solution so I think I may give up and try the pthread build again. :-( If the --without-pthreads option is not deprecated, maybe you should try building it on some other Unix... Anyway, here is some info related to this crash:

(gdb) frame 0
#0  0x0436b840 in _MD_InitCPUS () at /tmp/firefox-3.0rc1/mozilla/nsprpub/pr/src/md/unix/unix.c:2022
2022        _PR_IOQ_MAX_OSFD(me->cpu) = _pr_md_pipefd[0];
(gdb) list
2017        PRInt32 rv, flags;
2018        PRThread *me = _MD_CURRENT_THREAD();
2019    
2020        rv = pipe(_pr_md_pipefd);
2021        PR_ASSERT(rv == 0);
2022        _PR_IOQ_MAX_OSFD(me->cpu) = _pr_md_pipefd[0];
2023    #ifndef _PR_USE_POLL
2024        FD_SET(_pr_md_pipefd[0], &_PR_FD_READ_SET(me->cpu));
2025    #endif
2026    
(gdb) print me->cpu
$6 = (_PRCPU *) 0x0
I would encourage you to try the build with pthreads again. Even if there are other problems, you are more likely to get support on the pthread code since it is supported on many other platforms.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

The bug assignee didn't login in Bugzilla in the last 7 months and this bug has severity 'major'.
:KaiE, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: wtc → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(kaie)
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(kaie)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: