Bus error in PR_StackPop at os_Irix.s:81

RESOLVED FIXED

Status

NSPR
NSPR
P3
normal
RESOLVED FIXED
19 years ago
19 years ago

People

(Reporter: Wan-Teh Chang, Assigned: srinivas)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

19 years ago
The OS is IRIX 6.5.  The test machines are foo3.mcom.com
and hsync.mcom.com.  The NSPR release is 3.1.1.

When running the poll_nm test, optimized build,
occasionally I get a core dump due to a bus error.
This is very hard to reproduce.  I need to write a
shell script to run the poll_nm test repeatedly:
    while true; do
    poll_nm
    echo ok
    done
Eventually you will get a core file.  The stack
trace at the crash is:
dbx poll_nm core
dbx version 7.3 BETA 54632_Mar27_BETA Mar 27 1999 02:40:31
Core from signal SIGBUS: Bus error
(dbx) where
>  0 PR_StackPop(0x100167b8, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mn
t/u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/md/unix/os_Irix.s":81, 0x40307b8]
   1 _PR_Getfd(0x100167b8, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/
u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/io/prfdcach.c":78, 0x400dc54]
   2 pt_SetMethods(0x7, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/u/w
tc/release/v3.1.1/mozilla/nsprpub/pr/src/pthreads/ptio.c":2790, 0x4026df8]
   3 pt_Accept(0x0, 0x0, 0xffffffff, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/
u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/pthreads/ptio.c":1666, 0x4025740]
   4 PR_Accept(0x100167b8, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x1, 0x1) ["/tmp_mnt/
u/wtc/release/v3.1.1/mozilla/nsprpub/pr/src/io/priometh.c":166, 0x400fc90]
   5 main(0x0, 0x2, 0x0, 0x7, 0xffffffff, 0x0, 0x0, 0x0) ["/tmp_mnt/u/wtc/releas
e/v3.1.1/mozilla/nsprpub/pr/tests/poll_nm.c":283, 0x10001e08]
   6 __start() ["/xlv55/kudzu-apr12/work/irix/lib/libc/libc_n32_M4/csu/crt1text.
s":177, 0x10001788]
(dbx)

This does not happen in the debug build because
in the debug build the fd cache is not implemented
as an atomic stack.

One can work around this bug by setting the
environment variable NSPR_FD_CACHE_SIZE_HIGH
to a nonzero value to disable the atomic stack
code in NSPR's fd cache, e.g.,
    setenv NSPR_FD_CACHE_SIZE_HIGH 1024
(Assignee)

Updated

19 years ago
Status: NEW → ASSIGNED
(Assignee)

Comment 1

19 years ago
There is a bug in PR_StackPop; a branch instruction in the delay slot of another
branch instruction, which can result in undefined behaviour.

Files modified (NSPR_3_1_BRANCH):

ps/src/md/unix/os_Irix.s - rev. 2.4.4.1
(Assignee)

Comment 2

19 years ago
There is a hardware bug in the R10K chip, of rev 3.1 and earlier, that can cause
a ll/sc instruction sequence to succeed incorrectly, when two ll instructions
are executed within a span of 32 instructions.
(Assignee)

Comment 3

19 years ago
Add extra "nop" instructions to the stack push/pop routines for the workaround.

Files modified:

ps/src/md/unix/os_Irix.s - rev 2.7
(Reporter)

Comment 4

19 years ago
Checked in the extra nop fix to NSPR20_RELEASE_3_1_BRANCH,
in preparation for the NSPR 3.1.2 patch release.

/m/src/ns/nspr20/pr/src/md/unix/os_Irix.s, revision 2.4.4.2.
(Reporter)

Comment 5

19 years ago
I can't log into hsync.mcom.com right now.
But I used the shell script to run the
poll_nm test repeatedly on foo3.mcom.com
(IRIX 6.5) and foo2.mcom.com (IRIX 6.2)
and it still hasn't crashed after 5 minutes.
(Reporter)

Updated

19 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 19 years ago
Resolution: --- → FIXED
(Reporter)

Comment 6

19 years ago
Marked the bug fixed.
You need to log in before you can comment on or make changes to this bug.