Closed
Bug 15604
Opened 26 years ago
Closed 26 years ago
startup crash in PrepareAndDispatch (solaris/native optimized)
Categories
(Core :: XPConnect, defect, P3)
Tracking
()
RESOLVED
FIXED
M15
People
(Reporter: tor, Assigned: rich.burridge)
References
Details
(Keywords: crash)
Attachments
(2 files)
|
737 bytes,
patch
|
Details | Diff | Splinter Review | |
|
5.77 KB,
patch
|
Details | Diff | Splinter Review |
apprunner crashes on startup in an optimized build (--disable-debug
--enable-optimize) on Solaris/native(4.2) with a 10/5 morning CVS pull.
The same source tree built in normal debug mode runs fine. The crash
happens quite early:
(dbx) run
Running: apprunner
(process id 13599)
Warning: MOZILLA_FIVE_HOME not set.
Reading symbolic information for libjsloader.so
Reading symbolic information for libxpconnect.so
t@1 (l@1) signal SEGV (no mapping at the fault address) in PrepareAndDispatch at
0xff203ce8
0xff203ce8: PrepareAndDispatch+0x000c: ld [%i0], %l0
Current function is PL_HashTableEnumerateEntries
368 rv = (*f)(he, n, arg);
(dbx) where
current thread: t@1
[1] PrepareAndDispatch(0x80000000, 0x40000, 0xffbef0a0, 0x6c2d8, 0xb2468,
0xfeccf444), at 0xff203ce8
[2] 0xff204854(0x80000000, 0x4, 0xffbef0a0, 0x6c2d8, 0xfed25524, 0xff203f44),
at 0xff204853
[3] mozJSComponentLoader::AutoRegisterComponent(0x80000000, 0x57458, 0x0,
0xffbef0bc, 0xfed25128, 0x3f128), at 0xfed13e0c
[4] mozJSComponentLoader::RegisterComponentsInDir(0x3f128, 0x80000000,
0x6c260, 0xfed25128, 0x57430, 0x0), at 0xfed137c4
[5] AutoRegister_enumerate(0x57c88, 0x3f128, 0xffbef2ec, 0xff1eeb54,
0xff3dc9d0, 0x5a), at 0xff1eeb84
[6] _hashEnumerate(0x1, 0x0, 0xffbef280, 0xff3dd2c8, 0xff3dc9d0, 0x6a), at
0xff1c40a4
=>[7] PL_HashTableEnumerateEntries(ht = 0x28980, f = 0xff1c4090 =
&_hashEnumerate(), arg = 0xffbef280), line 368 in "plhash.c"
[8] nsHashtable::Enumerate(0x28320, 0xff1eeb54, 0xffbef2ec, 0x2a950,
0xff221aec, 0x17), at 0xff1c4544
[9] nsComponentManagerImpl::AutoRegister(0x6c260, 0x0, 0x0, 0xff221aec,
0x80000000, 0x2a950), at 0xff1ef0c8
[10] nsComponentManager::AutoRegister(0x0, 0x0, 0x11a1c, 0x56b48, 0x0, 0x0),
at 0xff1f48b8
[11] NS_SetupRegistry_1(0x0, 0xff22bb04, 0x56b48, 0x2766c, 0xfee3ac90, 0x3f8),
at 0x1496c
[12] main1(0x1, 0xffbef64c, 0x80000000, 0xfee37be4, 0x26370, 0x2a950), at
0x12e6c
[13] main(0x1, 0xffbef64c, 0x26370, 0x26400, 0x0, 0x0), at 0x1392c
(dbx) p f
f = 0xff1c4090 = &_hashEnumerate()
(dbx) p *he
*he = {
next = (nil)
keyHash = 1954228146U
key = 0x57c88
value = 0x3f128
}
(dbx) p n
n = 0
(dbx) p arg
arg = 0xffbef280
Updated•26 years ago
|
Assignee: jband → rogerl
Comment 1•26 years ago
|
||
This is rogerl's code.
Shaver's JSComponents stuff causes xpconnect calls to happen early. But whatever
is wrong was probably already wrong, just not right up front.
Is object layout different for release builds? Are there compiler flag issues?
Updated•26 years ago
|
Status: NEW → ASSIGNED
Updated•26 years ago
|
Severity: major → critical
Comment 2•26 years ago
|
||
Here's a simpler test case (I have a fix already, just waiting for the tree)
function f() { re = /,/; re.exec("8,7"); return "|"; }
reg = /X/g
str = "QQXAAAXBBBXCCC";
print(str.replace(reg, f));
Comment 3•26 years ago
|
||
Nuts, wrong bug. Sorry.
Updated•26 years ago
|
Target Milestone: M13 → M14
Comment 4•26 years ago
|
||
Not going to make M13 for these.
Adding "crash" keyword to all known open crasher bugs.
Keywords: crash
Updated•26 years ago
|
Assignee: rogerl → drapeau
Status: ASSIGNED → NEW
Comment 8•26 years ago
|
||
Per Clayton, bulk moving Solaris bugs to Sun.
| Assignee | ||
Comment 9•26 years ago
|
||
| Assignee | ||
Comment 10•26 years ago
|
||
[richb - 16th March 2000]
The problem here only occurs with the Sun native compiler (version 4.2 or 5.0)
when compiling
.../mozilla/xpcom/reflect/xptcall/src/md/unix/xptcstubs_sparc_solaris.cpp
with -O (really -O2). If this file is compiled with -g, nothing, -O1, -O3 or
-O4, there is no problem.
The problem is incorrect assembly code is generated for the various
STUB_ENTRY(n)'s with -O2. With -O1 (or the other compile options that work),
the assembler for each Stubn() entry looks something like:
0xfee62494: Stub3 : save %sp, -0x60, %sp
0xfee62498: Stub3+0x0004: call _PROCEDURE_LINKAGE_TABLE_+0x2d0c [PLT]
0xfee6249c: Stub3+0x0008: mov 0x3, %o0
0xfee624a0: Stub3+0x000c: ret
0xfee624a4: Stub3+0x0010: restore %g0, %o0, %o0
0xfee624a8: Stub3+0x0014: unimp 0x0
When compiled with -O2, the assembly code for the Stubn()'s is:
0xfee62414: Stub3 : mov %o7, %g1
0xfee62418: Stub3+0x0004: mov 0x3, %o0
0xfee6241c: Stub3+0x0008: call _PROCEDURE_LINKAGE_TABLE_+0x2d0c [PLT]
0xfee62420: Stub3+0x000c: mov %g1, %o7
As you can see the stack pointer is not properly saved and restored, and there
is no proper return.
This would appear to be a Sun native compiler bug, and I will attempt to
isolate this into a small example with fails, but for now a simple workaround
has been found (see attachment
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=6613).
Comment 11•26 years ago
|
||
Re-assigning to Rich Burridge, working on the Solaris port of Netscape 6.
Assignee: drapeau → rich.burridge
| Assignee | ||
Updated•26 years ago
|
Status: NEW → ASSIGNED
| Assignee | ||
Comment 12•26 years ago
|
||
[richb - 20th Match 2000]
I submitted a bug against the Sun compiler for the failure to do the right
thing when compiled with -O2. Here's the comments from the evaluation:
"The code for the Stub##n() routines calls SharedStub().
The SharedStub() routine is an assembly language routine
that thinks it knows where to find things other than it's arguments.
Tail-call optimizations among other things break the assumptions
built into SharedStub().
I suggest you re-write the Stub##n() routines in assembly language,
so that they don't break.
I also suggest that you use the optimized code produced
by the compiler for those routines,
though that will necessitate rewriting SharedStub() too.
Of course, it looks like this whole mass of code it very non-portable.
It should probably be re-written to use <stdarg.h>.
I found the code for SharedStub() at:
http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptcall/src/md/unix/xptcstubs_asm_sparc_solaris.s
I'm going to close this as not a bug."
The evaluator also sent me the following comments by private email:
"I found:
http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_asm_sparc_solaris.s
It has the wrong value for the "save" instruction, should be -(64+32).
The SUNW version has the same problem.
Look in /usr/include/sys/stack.h for what should happen.
I found:
http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptcall/src/md/unix/xptcstubs_asm_sparc_solaris.s
This code is bogus.
It thinks it can look in the register window and find the arguments to
it's caller -- it can't in the presence of tail-call optimizations.
It also has the same problem with the "save" instruction."
I will now look into adjusting the SPARC assembly code to do the right thing.
| Assignee | ||
Comment 13•26 years ago
|
||
[richb - 21st March 2000]
I've just been trying out the suggested code from the Sun compiler guy.
His version of SharedStub (in xptcstubs_asm_sparc_solaris.s) looks like:
SharedStub:
st %o1, [%fp + 72] ! store the args
st %o2, [%fp + 76]
st %o3, [%fp + 80]
st %o4, [%fp + 84]
st %o5, [%fp + 88]
! %o0 already has "this" or "that"?
ld [%fp + 68], %o1 ! get "n" aka index
add %fp, 72, %o2 ! form pointer to args
save %sp, -96, %sp ! need a stack frame,
! or PrepareAndDispatch may overwrite
! the args vector
mov %i0, %o0
mov %i1, %o1
call PrepareAndDispatch ! tail call
mov %i2, %o2
ret
restore %g0, %o0, %o0
and he suggested making the stubs look like:
or %g0,3,%g1 ! where "n" is 3, save it until %o reg's available
st %g1, [%fp+68] ! you could probably use %g2 instead of [%fp+68]
or %g0,%o7,%g1
call SharedStub ! (tail call)
or %g0,%g1,%o7
Now there are two problems with this stub definition. As it currently stands
the stub needs to be defined via #define in xptcstubs_sparc_solaris.cpp. This
needs to return an nsresult. The second problem is that the "3" is fine for
Stub3(), but the Stub4() needs 4, and so on.
As an experiment I tried changing the STUB_ENTRY #define to:
#define STUB_ENTRY(n) \
nsresult nsXPTCStubBase::Stub##n() \
{ \
register nsresult result = 0; \
asm(" or %g0, 3, %g1"); \
asm(" st %g1, [%fp+68]"); \
asm(" or %g0, %o7, %g1"); \
asm(" call SharedStub"); \
asm(" or %g0, %g1, %o7"); \
asm(" mov %o0, %i0"); \
return result; \
}
and used the above SharedStub definition. This compiled, built and ran
(Solaris CC 5.0 compilers) for -O1, -O2, -O3 and -O4.
I also tried it with gcc (-O2) and it compiled, built and ran fine.
If I just compiled xptcstubs_sparc_solaris.cpp with -g (Sun CC 5.0 compiler),
then when I ran it, it segv'ed with:
current thread: t@5
=>[1] PrepareAndDispatch(self = (nil), methodIndex = 3U, args = 0xfd9d1a50),
line 51 in "xptcstubs_sparc_solaris.cpp"
[2] XPTC_InvokeByIndex(0x0, 0x3, 0xfd9d1a50, 0xfdc277e8, 0x2454a8, 0x2662c0),
at 0xfee6659c
[3] nsXPTCStubBase::Stub3(this = 0x3), line 5 in "xptcstubsdef.inc"
[4] nsFileTransport::Process(0x80000000, 0xfd9d1bdc, 0xfdc277e8, 0x177430,
0x0, 0x290c0), at 0xfdad6134
[5] nsFileTransport::Run(0x177430, 0x0, 0x177434, 0x80000000, 0x10b1f0, 0x1),
at 0xfdad5cd0
[6] nsThreadPoolRunnable::Run(0x0, 0x177434, 0x10cb68, 0xfeaddfe0, 0x0,
0x10e0a9), at 0xfee565a4
[7] nsThread::Main(0x10d338, 0xfeae5418, 0x2, 0xfeadc55c, 0x8, 0x110a80), at
0xfee550e8
[8] 0xff0b9ed4(0x110a80, 0xff0d3bdc, 0x0, 0x110a80, 0x0, 0xfe955dc8), at
0xff0b9ed3
I sent my concerns back to the compiler guy. Here are his comments:
"What you are trying to do needs to be written in assembly language.
It cannot be expressed in C++, not even as "asm()".
The unfortunate part is that this will force you to
include the name mangling in the assembly language,
so it will not be portable across different C++ compilers.
Also, you forgot to convert the "3" to "n" in the first "asm()"."
If I did this, then it wouldn't fit in with the generic
#include "xptcstubsdef.inc" approach that is being currently used for multi
platform support.
This is getting beyond my level of expertise, and the compiler guys level of
interest (which is fair enough; he's been *very* helpful seeing this isn't
really his problem).
The simplest fix/hack seems to be to use the existing
xptcstubs_asm_sparc_solaris.s code (even though it might be bogus) and
make the #define for STUB_ENTRY in xptcstubs_sparc_solaris.cpp to be:
#define STUB_ENTRY(n) \
nsresult nsXPTCStubBase::Stub##n() \
{ \
int retval = SharedStub(n); \
return retval; \
}
which I believe prevents the tail call optimisation occuring.
Alternatively perhaps we (myself and Ashutosh Kulkarni, a member of the
Blackwood group here who has done SPARC assembly code in a previous life)
can meet with you (Chris and Roger; originators of the assembler code), to
see if we can come up with a better (less hacky) solution.
Comments please.
| Assignee | ||
Comment 14•26 years ago
|
||
[richb - 22nd March 2000]
I thought I added these comments yesterday. Hmmmn...
The compiler guy came back with another suggestion:
"The following code won't work to turn off tail-call optimization.
But it gave me another idea, below.
> #define STUB_ENTRY(n) \
> nsresult nsXPTCStubBase::Stub##n() \
> { \
> int retval = SharedStub(n); \
> return retval; \
> }
"You could try the following, with the original verion of SharedStub()
plus the fix for the save instr.
#define STUB_ENTRY(n) \
nsresult nsXPTCStubBase::Stub##n() \
{ \
int dummy; /* defeat tail-call optimization */ \
return SharedStub(n, &dummy); \
}
That should defeat the tail call optimization,
without much extra code (probably one instruction)."
His change (plus the fix for the extern declaration for SharedStub in
xptcstubs_sparc_solaris.cpp, which now reads:
extern "C" int SharedStub(int, int*);
compiles, builds and runs (with the Sun CC 5.0 compiler). I tried -O2, -g
and -O4. It also compiles, builds and successfully runs it with gcc with -O2.
These are the only variants I've tried. I've no reason to believe the other
optimisation levels won't be successful as well.
The sets of diffs for this change are at:
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=6752
This affects:
xptcinvoke_asm_sparc_solaris_SUNW.s
xptcstubs_asm_sparc_solaris.s
xptcstubs_sparc_solaris.cpp
This fix also includes the adjustment to the save instruction for the
XPTC_InvokeByIndex routine in xptcinvoke_asm_sparc_solaris_SUNW.s.
This would appear to be an acceptable solution.
Anybody care to comment on this? You are all awfully quiet out there.
Comment 15•26 years ago
|
||
Rich,
Sorry about the silence. I think a number of us have been watching your
well documented progress but just keeping quiet for lack of intelligent things
to say. I did contact rogerl a few days ago to be sure that he was available to
help. He answered that he would. Unfortunately, I believe that he's been off
work for a bit. He really is the best person on our end to comment on the
specific changes. FWIW, mcafee is going on sabbatical soon (if he is not already
gone).
I certainly don't mind any of these changes that use otherwise unnecessary
variables to defeat the optimizer. If it works then I think this is better than
rewriting the stubs in asm.
I don't know how much of a rush anyone is in to get this checked in. I'd vote on
waiting for review from rogerl just to get another set of knowlegable eyes
looking at the solution. But I certainly trust you if you say it works.
I really apreciate you doing this work!
John.
Comment 16•26 years ago
|
||
Reports of my absence have been exaggerated, but I have been digging out from
under the JS bugs reported while I was gone last week, sorry not to
have replied sooner.
Despite jband's flattery, my assistance in this can only be minimal, my exposure
to sparc assembly & ABI extends to this one hack, period, so I'm going
to bow to the "compiler guy's" assertions here about defeating the tail-call
optimization. The latest fix that's been proposed seems perfectly fine to me,
and since it's working I'd say let's get it in.
| Assignee | ||
Comment 17•26 years ago
|
||
[richb - 23rd March 2000]
While I'm on vacation, one of my group is going to checkout this fix with the
Sun 4.2 compiler as well. There might be one more .s file that needs the save
instruction adjustment too. More on this when I get back (4/3/00).
| Assignee | ||
Comment 18•26 years ago
|
||
| Assignee | ||
Comment 19•26 years ago
|
||
[richb - 4th April 2000]
Back from vacation. Just catching up on things.
The compiler guy came back with the following:
"I just realized there are a couple of "niceties" that are not
in the code for XPTC_InvokeByIndex() and probably the code
for SharedStub() too.
At the end of the assembly language routine,
put:
.size XPTC_InvokeByIndex, .-XPTC_InvokeByIndex
.type XPTC_InvokeByIndex, #function
That way they show up correctly in the ELF symbol table
(e.g. with "nm").
Also, for the internal labels, like "invoke",
prefix them with a ".", as in ".invoke",
that way they become local and don't show up in the symbol table."
----
Margaret Chan did some further investigation of file usage for SW 5.0
compilers vs SW 4.2 and gcc compilers. Here are her comments:
"Your theory about WS4.2 & GCC picks up one file and WS5.0 picks up the
other is correct. This is what I've observed from my build:
1. 5.0 build - picks up xptcinvoke_asm_sparc_solaris_SUNW.s
I did a full build for this. At the end of the build, I've got these
4 files in the
../mozilla/xpcom/reflect/xptcall/src/md/unix directory:
xptcinvoke_asm_sparc_solaris_SUNW.o xptcstubs_asm_sparc_solaris.o
xptcinvoke_sparc_solaris.o xptcstubs_sparc_solaris.o
2. 4.2 build - picks up xptcinvoke_asm_sparc_solaris_GCC.s
I only did a gmake in the directory mentioned in #1 above.
At the end of the build, I've got these 4 files in the directory:
xptcinvoke_asm_sparc_solaris_GCC.o xptcstubs_asm_sparc_solaris.o
xptcinvoke_sparc_solaris.o xptcstubs_sparc_solaris.o
3. GCC build - picks up xptcinvoke_asm_sparc_solaris_GCC.s
The 4 files in the directory after the build is the same as #2 above.
So it appears that you'll have to apply your fix to yet another .s file for
4.2/GCC build."
----
Taking all of this into account, I've generated yet another set of diffs for
the various files. This can be found at:
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=7232
If you (the Mozilla folks) are agreeable, I'd like to check these changes in,
when leaf allocates me a CVS login id for the CVS contributor forms I submitted
about ten days ago. Leaf?
| Assignee | ||
Comment 20•26 years ago
|
||
[richb - 5th April 2000]
I've now got a CVS checkin account (thanks Leaf!). Apologies for the
niave question, but can somebody tell me please how I now proceed to
check these changes in? Do I have to get approval from two reviewers
(the module owner)? Thanks.
Comment 21•26 years ago
|
||
rich... you should talk to your voucher (george, i think) about what the
development rules are.
Basically, look at the top of tinderbox
http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey
for the most current policy.
We're in lockdown mode right now, so i'm not going to be allowing checkins
unless they are regression fixes, memory leak fixes, or other entropy-fighting
checkins.
basically, you need to have changes thoroughly tested on at least one platform,
preferrably more, and reviewed by someone with some expertise in the area of
code that you're touching.
Comment 22•26 years ago
|
||
Leaf overstates somewhat, I think, in that
- testing your changes on more than one platform would be somewhat non-sensical,
and
- we have a tradition of taking clearly-platform-specific patches during
milestone lockdown, because they present a very low risk to all other platforms
and they tend to be for platforms that didn't work in the first place
(As a general description of how we handle milestone lockdown, though, his blurb
is predictably great.)
| Assignee | ||
Comment 23•26 years ago
|
||
[richb - 5th April 2000]
Okay. Thanks Leaf/Mike! I've just sent some mail to jband asking who the
reviewer should be (Roger?). When /if I get the go ahead from him, I'll
check the changes back.
Comment 24•26 years ago
|
||
*** Bug 8373 has been marked as a duplicate of this bug. ***
Comment 25•26 years ago
|
||
Looks cool - let's get it in, r=rogerl
| Assignee | ||
Comment 26•26 years ago
|
||
[richb - 7th Apr 2000]
Fix checked in for M15.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•