Closed Bug 15604 Opened 26 years ago Closed 26 years ago

startup crash in PrepareAndDispatch (solaris/native optimized)

Categories

(Core :: XPConnect, defect, P3)

Sun
Solaris
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: tor, Assigned: rich.burridge)

References

Details

(Keywords: crash)

Attachments

(2 files)

apprunner crashes on startup in an optimized build (--disable-debug --enable-optimize) on Solaris/native(4.2) with a 10/5 morning CVS pull. The same source tree built in normal debug mode runs fine. The crash happens quite early: (dbx) run Running: apprunner (process id 13599) Warning: MOZILLA_FIVE_HOME not set. Reading symbolic information for libjsloader.so Reading symbolic information for libxpconnect.so t@1 (l@1) signal SEGV (no mapping at the fault address) in PrepareAndDispatch at 0xff203ce8 0xff203ce8: PrepareAndDispatch+0x000c: ld [%i0], %l0 Current function is PL_HashTableEnumerateEntries 368 rv = (*f)(he, n, arg); (dbx) where current thread: t@1 [1] PrepareAndDispatch(0x80000000, 0x40000, 0xffbef0a0, 0x6c2d8, 0xb2468, 0xfeccf444), at 0xff203ce8 [2] 0xff204854(0x80000000, 0x4, 0xffbef0a0, 0x6c2d8, 0xfed25524, 0xff203f44), at 0xff204853 [3] mozJSComponentLoader::AutoRegisterComponent(0x80000000, 0x57458, 0x0, 0xffbef0bc, 0xfed25128, 0x3f128), at 0xfed13e0c [4] mozJSComponentLoader::RegisterComponentsInDir(0x3f128, 0x80000000, 0x6c260, 0xfed25128, 0x57430, 0x0), at 0xfed137c4 [5] AutoRegister_enumerate(0x57c88, 0x3f128, 0xffbef2ec, 0xff1eeb54, 0xff3dc9d0, 0x5a), at 0xff1eeb84 [6] _hashEnumerate(0x1, 0x0, 0xffbef280, 0xff3dd2c8, 0xff3dc9d0, 0x6a), at 0xff1c40a4 =>[7] PL_HashTableEnumerateEntries(ht = 0x28980, f = 0xff1c4090 = &_hashEnumerate(), arg = 0xffbef280), line 368 in "plhash.c" [8] nsHashtable::Enumerate(0x28320, 0xff1eeb54, 0xffbef2ec, 0x2a950, 0xff221aec, 0x17), at 0xff1c4544 [9] nsComponentManagerImpl::AutoRegister(0x6c260, 0x0, 0x0, 0xff221aec, 0x80000000, 0x2a950), at 0xff1ef0c8 [10] nsComponentManager::AutoRegister(0x0, 0x0, 0x11a1c, 0x56b48, 0x0, 0x0), at 0xff1f48b8 [11] NS_SetupRegistry_1(0x0, 0xff22bb04, 0x56b48, 0x2766c, 0xfee3ac90, 0x3f8), at 0x1496c [12] main1(0x1, 0xffbef64c, 0x80000000, 0xfee37be4, 0x26370, 0x2a950), at 0x12e6c [13] main(0x1, 0xffbef64c, 0x26370, 0x26400, 0x0, 0x0), at 0x1392c (dbx) p f f = 0xff1c4090 = &_hashEnumerate() (dbx) p *he *he = { next = (nil) keyHash = 1954228146U key = 0x57c88 value = 0x3f128 } (dbx) p n n = 0 (dbx) p arg arg = 0xffbef280
Assignee: jband → rogerl
This is rogerl's code. Shaver's JSComponents stuff causes xpconnect calls to happen early. But whatever is wrong was probably already wrong, just not right up front. Is object layout different for release builds? Are there compiler flag issues?
Blocks: 16654
Status: NEW → ASSIGNED
Blocks: 16950
Blocks: 17432
Blocks: 17907
Severity: major → critical
Here's a simpler test case (I have a fix already, just waiting for the tree) function f() { re = /,/; re.exec("8,7"); return "|"; } reg = /X/g str = "QQXAAAXBBBXCCC"; print(str.replace(reg, f));
Nuts, wrong bug. Sorry.
Target Milestone: M13 → M14
Not going to make M13 for these.
QA Contact: cbegle → rginda
Updating QA Contact.
Adding "crash" keyword to all known open crasher bugs.
Keywords: crash
Punt.
Target Milestone: M14 → M15
Assignee: rogerl → drapeau
Status: ASSIGNED → NEW
Per Clayton, bulk moving Solaris bugs to Sun.
[richb - 16th March 2000] The problem here only occurs with the Sun native compiler (version 4.2 or 5.0) when compiling .../mozilla/xpcom/reflect/xptcall/src/md/unix/xptcstubs_sparc_solaris.cpp with -O (really -O2). If this file is compiled with -g, nothing, -O1, -O3 or -O4, there is no problem. The problem is incorrect assembly code is generated for the various STUB_ENTRY(n)'s with -O2. With -O1 (or the other compile options that work), the assembler for each Stubn() entry looks something like: 0xfee62494: Stub3 : save %sp, -0x60, %sp 0xfee62498: Stub3+0x0004: call _PROCEDURE_LINKAGE_TABLE_+0x2d0c [PLT] 0xfee6249c: Stub3+0x0008: mov 0x3, %o0 0xfee624a0: Stub3+0x000c: ret 0xfee624a4: Stub3+0x0010: restore %g0, %o0, %o0 0xfee624a8: Stub3+0x0014: unimp 0x0 When compiled with -O2, the assembly code for the Stubn()'s is: 0xfee62414: Stub3 : mov %o7, %g1 0xfee62418: Stub3+0x0004: mov 0x3, %o0 0xfee6241c: Stub3+0x0008: call _PROCEDURE_LINKAGE_TABLE_+0x2d0c [PLT] 0xfee62420: Stub3+0x000c: mov %g1, %o7 As you can see the stack pointer is not properly saved and restored, and there is no proper return. This would appear to be a Sun native compiler bug, and I will attempt to isolate this into a small example with fails, but for now a simple workaround has been found (see attachment http://bugzilla.mozilla.org/showattachment.cgi?attach_id=6613).
Re-assigning to Rich Burridge, working on the Solaris port of Netscape 6.
Assignee: drapeau → rich.burridge
Status: NEW → ASSIGNED
[richb - 20th Match 2000] I submitted a bug against the Sun compiler for the failure to do the right thing when compiled with -O2. Here's the comments from the evaluation: "The code for the Stub##n() routines calls SharedStub(). The SharedStub() routine is an assembly language routine that thinks it knows where to find things other than it's arguments. Tail-call optimizations among other things break the assumptions built into SharedStub(). I suggest you re-write the Stub##n() routines in assembly language, so that they don't break. I also suggest that you use the optimized code produced by the compiler for those routines, though that will necessitate rewriting SharedStub() too. Of course, it looks like this whole mass of code it very non-portable. It should probably be re-written to use <stdarg.h>. I found the code for SharedStub() at: http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptcall/src/md/unix/xptcstubs_asm_sparc_solaris.s I'm going to close this as not a bug." The evaluator also sent me the following comments by private email: "I found: http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_asm_sparc_solaris.s It has the wrong value for the "save" instruction, should be -(64+32). The SUNW version has the same problem. Look in /usr/include/sys/stack.h for what should happen. I found: http://lxr.mozilla.org/seamonkey/source/xpcom/reflect/xptcall/src/md/unix/xptcstubs_asm_sparc_solaris.s This code is bogus. It thinks it can look in the register window and find the arguments to it's caller -- it can't in the presence of tail-call optimizations. It also has the same problem with the "save" instruction." I will now look into adjusting the SPARC assembly code to do the right thing.
[richb - 21st March 2000] I've just been trying out the suggested code from the Sun compiler guy. His version of SharedStub (in xptcstubs_asm_sparc_solaris.s) looks like: SharedStub: st %o1, [%fp + 72] ! store the args st %o2, [%fp + 76] st %o3, [%fp + 80] st %o4, [%fp + 84] st %o5, [%fp + 88] ! %o0 already has "this" or "that"? ld [%fp + 68], %o1 ! get "n" aka index add %fp, 72, %o2 ! form pointer to args save %sp, -96, %sp ! need a stack frame, ! or PrepareAndDispatch may overwrite ! the args vector mov %i0, %o0 mov %i1, %o1 call PrepareAndDispatch ! tail call mov %i2, %o2 ret restore %g0, %o0, %o0 and he suggested making the stubs look like: or %g0,3,%g1 ! where "n" is 3, save it until %o reg's available st %g1, [%fp+68] ! you could probably use %g2 instead of [%fp+68] or %g0,%o7,%g1 call SharedStub ! (tail call) or %g0,%g1,%o7 Now there are two problems with this stub definition. As it currently stands the stub needs to be defined via #define in xptcstubs_sparc_solaris.cpp. This needs to return an nsresult. The second problem is that the "3" is fine for Stub3(), but the Stub4() needs 4, and so on. As an experiment I tried changing the STUB_ENTRY #define to: #define STUB_ENTRY(n) \ nsresult nsXPTCStubBase::Stub##n() \ { \ register nsresult result = 0; \ asm(" or %g0, 3, %g1"); \ asm(" st %g1, [%fp+68]"); \ asm(" or %g0, %o7, %g1"); \ asm(" call SharedStub"); \ asm(" or %g0, %g1, %o7"); \ asm(" mov %o0, %i0"); \ return result; \ } and used the above SharedStub definition. This compiled, built and ran (Solaris CC 5.0 compilers) for -O1, -O2, -O3 and -O4. I also tried it with gcc (-O2) and it compiled, built and ran fine. If I just compiled xptcstubs_sparc_solaris.cpp with -g (Sun CC 5.0 compiler), then when I ran it, it segv'ed with: current thread: t@5 =>[1] PrepareAndDispatch(self = (nil), methodIndex = 3U, args = 0xfd9d1a50), line 51 in "xptcstubs_sparc_solaris.cpp" [2] XPTC_InvokeByIndex(0x0, 0x3, 0xfd9d1a50, 0xfdc277e8, 0x2454a8, 0x2662c0), at 0xfee6659c [3] nsXPTCStubBase::Stub3(this = 0x3), line 5 in "xptcstubsdef.inc" [4] nsFileTransport::Process(0x80000000, 0xfd9d1bdc, 0xfdc277e8, 0x177430, 0x0, 0x290c0), at 0xfdad6134 [5] nsFileTransport::Run(0x177430, 0x0, 0x177434, 0x80000000, 0x10b1f0, 0x1), at 0xfdad5cd0 [6] nsThreadPoolRunnable::Run(0x0, 0x177434, 0x10cb68, 0xfeaddfe0, 0x0, 0x10e0a9), at 0xfee565a4 [7] nsThread::Main(0x10d338, 0xfeae5418, 0x2, 0xfeadc55c, 0x8, 0x110a80), at 0xfee550e8 [8] 0xff0b9ed4(0x110a80, 0xff0d3bdc, 0x0, 0x110a80, 0x0, 0xfe955dc8), at 0xff0b9ed3 I sent my concerns back to the compiler guy. Here are his comments: "What you are trying to do needs to be written in assembly language. It cannot be expressed in C++, not even as "asm()". The unfortunate part is that this will force you to include the name mangling in the assembly language, so it will not be portable across different C++ compilers. Also, you forgot to convert the "3" to "n" in the first "asm()"." If I did this, then it wouldn't fit in with the generic #include "xptcstubsdef.inc" approach that is being currently used for multi platform support. This is getting beyond my level of expertise, and the compiler guys level of interest (which is fair enough; he's been *very* helpful seeing this isn't really his problem). The simplest fix/hack seems to be to use the existing xptcstubs_asm_sparc_solaris.s code (even though it might be bogus) and make the #define for STUB_ENTRY in xptcstubs_sparc_solaris.cpp to be: #define STUB_ENTRY(n) \ nsresult nsXPTCStubBase::Stub##n() \ { \ int retval = SharedStub(n); \ return retval; \ } which I believe prevents the tail call optimisation occuring. Alternatively perhaps we (myself and Ashutosh Kulkarni, a member of the Blackwood group here who has done SPARC assembly code in a previous life) can meet with you (Chris and Roger; originators of the assembler code), to see if we can come up with a better (less hacky) solution. Comments please.
[richb - 22nd March 2000] I thought I added these comments yesterday. Hmmmn... The compiler guy came back with another suggestion: "The following code won't work to turn off tail-call optimization. But it gave me another idea, below. > #define STUB_ENTRY(n) \ > nsresult nsXPTCStubBase::Stub##n() \ > { \ > int retval = SharedStub(n); \ > return retval; \ > } "You could try the following, with the original verion of SharedStub() plus the fix for the save instr. #define STUB_ENTRY(n) \ nsresult nsXPTCStubBase::Stub##n() \ { \ int dummy; /* defeat tail-call optimization */ \ return SharedStub(n, &dummy); \ } That should defeat the tail call optimization, without much extra code (probably one instruction)." His change (plus the fix for the extern declaration for SharedStub in xptcstubs_sparc_solaris.cpp, which now reads: extern "C" int SharedStub(int, int*); compiles, builds and runs (with the Sun CC 5.0 compiler). I tried -O2, -g and -O4. It also compiles, builds and successfully runs it with gcc with -O2. These are the only variants I've tried. I've no reason to believe the other optimisation levels won't be successful as well. The sets of diffs for this change are at: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=6752 This affects: xptcinvoke_asm_sparc_solaris_SUNW.s xptcstubs_asm_sparc_solaris.s xptcstubs_sparc_solaris.cpp This fix also includes the adjustment to the save instruction for the XPTC_InvokeByIndex routine in xptcinvoke_asm_sparc_solaris_SUNW.s. This would appear to be an acceptable solution. Anybody care to comment on this? You are all awfully quiet out there.
Rich, Sorry about the silence. I think a number of us have been watching your well documented progress but just keeping quiet for lack of intelligent things to say. I did contact rogerl a few days ago to be sure that he was available to help. He answered that he would. Unfortunately, I believe that he's been off work for a bit. He really is the best person on our end to comment on the specific changes. FWIW, mcafee is going on sabbatical soon (if he is not already gone). I certainly don't mind any of these changes that use otherwise unnecessary variables to defeat the optimizer. If it works then I think this is better than rewriting the stubs in asm. I don't know how much of a rush anyone is in to get this checked in. I'd vote on waiting for review from rogerl just to get another set of knowlegable eyes looking at the solution. But I certainly trust you if you say it works. I really apreciate you doing this work! John.
Reports of my absence have been exaggerated, but I have been digging out from under the JS bugs reported while I was gone last week, sorry not to have replied sooner. Despite jband's flattery, my assistance in this can only be minimal, my exposure to sparc assembly & ABI extends to this one hack, period, so I'm going to bow to the "compiler guy's" assertions here about defeating the tail-call optimization. The latest fix that's been proposed seems perfectly fine to me, and since it's working I'd say let's get it in.
[richb - 23rd March 2000] While I'm on vacation, one of my group is going to checkout this fix with the Sun 4.2 compiler as well. There might be one more .s file that needs the save instruction adjustment too. More on this when I get back (4/3/00).
[richb - 4th April 2000] Back from vacation. Just catching up on things. The compiler guy came back with the following: "I just realized there are a couple of "niceties" that are not in the code for XPTC_InvokeByIndex() and probably the code for SharedStub() too. At the end of the assembly language routine, put: .size XPTC_InvokeByIndex, .-XPTC_InvokeByIndex .type XPTC_InvokeByIndex, #function That way they show up correctly in the ELF symbol table (e.g. with "nm"). Also, for the internal labels, like "invoke", prefix them with a ".", as in ".invoke", that way they become local and don't show up in the symbol table." ---- Margaret Chan did some further investigation of file usage for SW 5.0 compilers vs SW 4.2 and gcc compilers. Here are her comments: "Your theory about WS4.2 & GCC picks up one file and WS5.0 picks up the other is correct. This is what I've observed from my build: 1. 5.0 build - picks up xptcinvoke_asm_sparc_solaris_SUNW.s I did a full build for this. At the end of the build, I've got these 4 files in the ../mozilla/xpcom/reflect/xptcall/src/md/unix directory: xptcinvoke_asm_sparc_solaris_SUNW.o xptcstubs_asm_sparc_solaris.o xptcinvoke_sparc_solaris.o xptcstubs_sparc_solaris.o 2. 4.2 build - picks up xptcinvoke_asm_sparc_solaris_GCC.s I only did a gmake in the directory mentioned in #1 above. At the end of the build, I've got these 4 files in the directory: xptcinvoke_asm_sparc_solaris_GCC.o xptcstubs_asm_sparc_solaris.o xptcinvoke_sparc_solaris.o xptcstubs_sparc_solaris.o 3. GCC build - picks up xptcinvoke_asm_sparc_solaris_GCC.s The 4 files in the directory after the build is the same as #2 above. So it appears that you'll have to apply your fix to yet another .s file for 4.2/GCC build." ---- Taking all of this into account, I've generated yet another set of diffs for the various files. This can be found at: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=7232 If you (the Mozilla folks) are agreeable, I'd like to check these changes in, when leaf allocates me a CVS login id for the CVS contributor forms I submitted about ten days ago. Leaf?
[richb - 5th April 2000] I've now got a CVS checkin account (thanks Leaf!). Apologies for the niave question, but can somebody tell me please how I now proceed to check these changes in? Do I have to get approval from two reviewers (the module owner)? Thanks.
rich... you should talk to your voucher (george, i think) about what the development rules are. Basically, look at the top of tinderbox http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey for the most current policy. We're in lockdown mode right now, so i'm not going to be allowing checkins unless they are regression fixes, memory leak fixes, or other entropy-fighting checkins. basically, you need to have changes thoroughly tested on at least one platform, preferrably more, and reviewed by someone with some expertise in the area of code that you're touching.
Leaf overstates somewhat, I think, in that - testing your changes on more than one platform would be somewhat non-sensical, and - we have a tradition of taking clearly-platform-specific patches during milestone lockdown, because they present a very low risk to all other platforms and they tend to be for platforms that didn't work in the first place (As a general description of how we handle milestone lockdown, though, his blurb is predictably great.)
[richb - 5th April 2000] Okay. Thanks Leaf/Mike! I've just sent some mail to jband asking who the reviewer should be (Roger?). When /if I get the go ahead from him, I'll check the changes back.
*** Bug 8373 has been marked as a duplicate of this bug. ***
Looks cool - let's get it in, r=rogerl
[richb - 7th Apr 2000] Fix checked in for M15.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago
Resolution: --- → FIXED
No longer blocks: 17432
No longer blocks: 17907
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: