Closed Bug 119834 Opened 23 years ago Closed 23 years ago

CopyHostent causes a bus error on IRIX gcc/debug build

Categories

(NSPR :: NSPR, defect, P1)

4.1.3
SGI
IRIX
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nickb, Assigned: wtc)

References

Details

Attachments

(3 files)

Using gcc-2.95.2 and mozilla 0.9.7 (branch) with debugging enabled I get a bus
error on startup.
Here is the stack trace:
(gdb) bt
#0  MakeIPv4MappedAddr (v4=0x107ae47c
"òÄ\002Oòx\236¨\211\020«mð~µ°À\204ЯÌð\212Ñ\002DÈêD\206\213%N\2047«Ø\004/L\2240y¢j[\013\021\026\ntyBCã\221Y\026",
v6=0x107ae46c "") at
/projects/sise/mozilla/devel/workpits/moz/0.9.7_branch_gcc/mozilla/nsprpub/pr/src/misc/prnetdb.c:254
#1  0x4139a10 in CopyHostent (from=0x101afe20, buf=0x101afe3c,
bufsize=0x101afe40, conversion=_PRIPAddrIPv4Mapped, to=0x107ae400) at
/projects/sise/mozilla/devel/workpits/moz/0.9.7_branch_gcc/mozilla/nsprpub/pr/src/misc/prnetdb.c:362
#2  0x413ae24 in PR_GetIPNodeByName (name=0x1077efe8 "dingo.adacel.com.au",
af=64032, flags=0, buf=0x107ae47c
"òÄ\002Oòx\236¨\211\020«mð~µ°À\204ЯÌð\212Ñ\002DÈêD\206\213%N\2047«Ø\004/L\2240y¢j[\013\021\026\ntyBCã\221Y\026",
bufsize=920, hp=0x107ae400) at
/projects/sise/mozilla/devel/workpits/moz/0.9.7_branch_gcc/mozilla/nsprpub/pr/src/misc/prnetdb.c:680
#3  0x45ef3e4 in nsDNSService::Run (this=0x100aeea8) at
/projects/sise/mozilla/devel/workpits/moz/0.9.7_branch_gcc/mozilla/netwerk/dns/src/nsDnsService.cpp:833
#4  0x5fe4ba8c in nsThread::Main (arg=0x0) at
/projects/sise/mozilla/devel/workpits/moz/0.9.7_branch_gcc/mozilla/xpcom/threads/nsThread.cpp:120
#5  0x414e474 in _pt_root (arg=0x0) at
/projects/sise/mozilla/devel/workpits/moz/0.9.7_branch_gcc/mozilla/nsprpub/pr/src/pthreads/ptthread.c:214
#6  0xc21b8ac in _SGIPT_pt_start () at pt.c:793

After adding some debug statements, I came to the conclusion that the problem
lies with the macro _PR_IN6_IS_ADDR_V4MAPPED(a), where a is a  PRIPv6Addr *.
Accessing the elements of union before passing to the Macro works fine (either
as a char * or as the struct), as can be seen from this debug output:
v6->pr_s6_addr32[0] = 0
v6->pr_s6_addr32[1] = 0
v6->pr_s6_addr32[2] = 65535
v6->pr_s6_addr32[3] = -1062716409
v6->pr_s6_addr16[0] = 0
v6->pr_s6_addr16[1] = 0
v6->pr_s6_addr16[2] = 0
v6->pr_s6_addr16[3] = 0
v6->pr_s6_addr16[4] = 0
v6->pr_s6_addr16[5] = 65535
v6->pr_s6_addr16[6] = 49320
v6->pr_s6_addr16[7] = 15367
v6->pr_s6_addr[0] = 0
v6->pr_s6_addr[1] = 0
v6->pr_s6_addr[2] = 0
v6->pr_s6_addr[3] = 0
v6->pr_s6_addr[4] = 0
v6->pr_s6_addr[5] = 0
v6->pr_s6_addr[6] = 0
v6->pr_s6_addr[7] = 0
v6->pr_s6_addr[8] = 0
v6->pr_s6_addr[9] = 0
v6->pr_s6_addr[10] = 255
v6->pr_s6_addr[11] = 255
v6->pr_s6_addr[12] = 192
v6->pr_s6_addr[13] = 168
v6->pr_s6_addr[14] = 60
v6->pr_s6_addr[15] = 7
v6[0] = 0
v6[1] = 0
v6[2] = 0
v6[3] = 0
v6[4] = 0
v6[5] = 0
v6[6] = 0
v6[7] = 0
v6[8] = 0
v6[9] = 0
v6[10] = 255
v6[11] = 255
v6[12] = 192
v6[13] = 168
v6[14] = 60
v6[15] = 7
PR_ASSERT(_PR_IN6_IS_ADDR_V4MAPPED(((PRIPv6Addr *) v6)));
nsWidget::~nsWidget() of toplevel: 16 widgets still exist.
moz_run_program[36]: 164808649 Bus error(coredump)
nsprpub/pr/src/misc/prnetdb.c - This is only really needed when using
gcc on mips, however, I cannot find a define under nsprpub to set this,
however it doesn't hurt for both MipsPro and gcc.
I should also mention that the stack trace is a little misleading, I suspect due
to optimisations, in that the line number IS correct, however the actual function
where it dies is MakeIPv4MappedAddr(), not CopyHostent(), altough 
MakeIPv4MappedAddr() IS called from CopyHostent().
Can you print the value of v6 in the debugger?
I bet that it is not a multiple of 4.

I believe it is a bug for the MakeIPv4MappedAddr
function to cast a char* to a PRIPv6Addr*.  Something
along the line of your fix should be used on all
platforms.
gdb) print *v6
$5 = 0 '\000'
(gdb) print *((PRIPv6Addr *) v6)
$6 = {_S6_un = {_S6_u8 = "\000\000\000\000\000\000\000\000\000\000ÿÿÀ¨<\a",
_S6_u16 = {0, 0, 0, 0, 0, 65535, 49320, 15367}, 
    _S6_u32 = {0, 0, 65535, 3232250887}, _S6_u64 = {0, 281473913994247}}}


The wierd thing is that I created a simple test case, which fairly closly mimics
the behaviour (I can post if you like), but could not re-create the crash.
Summary: CopyHostent causes a bus error on IRIX → CopyHostent causes a bus error on IRIX gcc/debug build
Sorry, I wasn't clear.  I meant v6, not *v6.  I want
the address.

In any case, I know what's wrong.

The easiest fix is to remove the PR_ASSERT statements
from MakeIPv4MappedAddr() and MakeIPv4CompatAddr()
in mozilla/nsprpub/pr/src/misc/prnetdb.c.  These two
PR_ASSERT statements incorrectly cast char* to
PRIPv6Addr*.  (PRIPv6Addr must be 4-byte or 8-byte
aligned.)  These two PR_ASSERT statements are there
to verify the correctness of the functions and are
not that useful.  (These two functions are simple.)

Of course, something like your fix would also work.

We can also fix this by making the casts legal.  This
would require allocating the char* with an appropriate
alignment.  I am not sure that this is necessary.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Summary: CopyHostent causes a bus error on IRIX gcc/debug build → CopyHostent causes a bus error on IRIX
Oops, sorry.
(gdb) print v6
$7 = 0x107ae46c ""
Summary: CopyHostent causes a bus error on IRIX → CopyHostent causes a bus error on IRIX gcc/debug build
Hmm... 0x107ae46c is a multiple of 4.  This does not support
my theory that this is an alignment problem.

One last try -- could you print (PRIPv6Addr *) v6?  I believe
that on IRIX, PRIPv6Addr is 8-byte aligned.  So it's possible
that the cast of a 4-byte aligned pointer to an 8-byte aligned
pointer is still problematic.
(gdb) print (PRIPv6Addr *) v6
$8 = (struct PRIPv6Addr *) 0x107ae46c

Which is NOT divisible by 8! :)

However, when I added debug statements, (the output is shown above), I could 
access all the elements via the struct (eg I had 
PRIPv6Addr *ptr = (PRIPv6Addr *)v6;
and printed all the elements)


I guess it might be better to leave the asserts... a bit of extra checking 
doesn't (well, shouldn't! ;-) )hurt, so maybe if we remove the #ifdef IRIX 
section and change them all to memcmp()'s? or would you rather keep it IRIX
specific?
I would agree that it looks like an alignment problem tho, as when I changed
the macro to only check the first and the eigth bytes it worked, adding any
others caused problems.
Could you show me how you changed the macro?

Any solution we come up with should not have
#ifdef IRIX.  This problem may affect other
platforms.

We can leave the assertions but rewrite them
to not do any type casts, but don't you think
that the MakeIPv4MappedAddr() and MakeIPv4CompatAddr()
functions are so simple that their correctness
is self-evident? :-)
I tried quite a few variations... but here are a few.
The original (which dumps):
#define _PR_IN6_IS_ADDR_V4MAPPED(a)                     \
                (((a)->pr_s6_addr32[0] == 0)    &&      \
		((a)->pr_s6_addr32[1] == 0)     &&      \
		((a)->pr_s6_addr[8] == 0)       &&      \
		((a)->pr_s6_addr[9] == 0)       &&      \
		((a)->pr_s6_addr[10] == 0xff)   &&      \
		((a)->pr_s6_addr[11] == 0xff))

This one worked:
#define _PR_IN6_IS_ADDR_V4MAPPED(a)                     \
                (((a)->pr_s6_addr32[0] == 0)    &&      \
		((a)->pr_s6_addr[8] == 0)

This one didn't:
#define _PR_IN6_IS_ADDR_V4MAPPED(a)
	(((a)->pr_s6_addr32[1] == 0)))

Do you need some more?

I agree that MakeIPv4MappedAddr() and MakeIPv4CompatAddr() are simple in the 
extreme... and could do without the assertions, however these macros are also 
used in some other functions:
PR_GetHostByAddr()
PR_IsNetAddrType()
and maybe others. So, wont they have the same problem? or is it only likely to
occur when cast from char *? 
This one doesn't work either:
#define _PR_IN6_IS_ADDR_V4MAPPED(a)                     \
                (((a)->pr_s6_addr32[0] == 0)    && \
		((a)->pr_s6_addr[9] == 0)

Is this the right direction? Can you please review this patch.

Thanks! :)
I decided to just delete the two assertions that contained
the illegal type casts.  I checked in the fix into the tip
of NSPR.
Priority: -- → P1
Target Milestone: --- → 4.2
Okay, that will work to! :)

Shall we close this then?
I am leaving this open as a reminder that I also need
to check it into the mozilla client branch of NSPR.

We have just missed the mozilla0.9.8 milestone.  The
fix will be in the mozilla0.9.9 milestone.
Fix checked into NSPRPUB_PRE_4_2_CLIENT_BRANCH.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
See Also: → 1425543
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: