Unoptimized Intel Mac debug build crashes with a misaligned stack

RESOLVED FIXED

Status

()

Core
XPCOM
--
critical
RESOLVED FIXED
12 years ago
11 years ago

People

(Reporter: Eric Albert, Assigned: Josh Aas)

Tracking

({fixed1.8.0.2, fixed1.8.1})

Trunk
PowerPC
Mac OS X
fixed1.8.0.2, fixed1.8.1
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.8.0.2 +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [camino-1.0])

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/417.2 (KHTML, like Gecko) Safari/417.1
Build Identifier: Trunk

On build 8F1099 of Mac OS X on Intel, an unoptimized debug build of Firefox crashes on launch in 
XPTC_InvokeByIndex with a misaligned stack.  The misalignment happens within XPTC_InvokeByIndex 
itself, which doesn't appear to be properly aligning the first call in the inline asm.

Program received signal EXC_BAD_INSTRUCTION, Illegal instruction/operand.
0x8fe12ee4 in __dyld_stub_binding_helper_interface ()
2: /x $esp = 0xbfffd8b8
1: x/i $pc  0x8fe12ee4 <__dyld_stub_binding_helper_interface+18>:       movdqa %xmm0,32(%esp)
(gdb) bt
#0  0x8fe12ee4 in __dyld_stub_binding_helper_interface ()
#1  0x01008000 in ?? ()
#2  0x0108e45d in XPTC_InvokeByIndex (that=0x1c3b690, methodIndex=3, paramCount=1, 
params=0xbfffdb4c) at xptcinvoke_unixish_x86.cpp:147
#3  0x2b1bf00f in XPCWrappedNative::CallMethod (ccx=@0xbfffdd74, mode=CALL_GETTER) at 
xpcwrappednative.cpp:2139
#4  0x2b1e6297 in XPCWrappedNative::GetAttribute (ccx=@0xbfffdd74) at xpcwrappednativejsops.cpp:
1925
#5  0x2b1c6263 in XPC_WN_GetterSetter (cx=0x1c32280, obj=0x204b068, argc=0, argv=0x1c439c8, 
vp=0xbfffdec4) at xpcwrappednativejsops.cpp:1476
#6  0x0054da41 in js_Invoke (cx=0x1c32280, argc=0, flags=2) at jsinterp.c:1177
#7  0x0054de69 in js_InternalInvoke (cx=0x1c32280, obj=0x204b068, fval=33862752, flags=0, 
argc=0, argv=0x0, rval=0xbfffe9bc) at jsinterp.c:1274
#8  0x0054e149 in js_InternalGetOrSet (cx=0x1c32280, obj=0x204b068, id=29580784, 
fval=33862752, mode=JSACC_READ, argc=0, argv=0x0, rval=0xbfffe9bc) at jsinterp.c:1317
#9  0x0057a70e in js_GetProperty (cx=0x1c32280, obj=0x204b068, id=29580784, vp=0xbfffe9bc) at 
jsobj.c:2931
#10 0x0055e1a0 in js_Interpret (cx=0x1c32280, pc=0x1c434ad "5", result=0xbfffed80) at jsinterp.c:
3327
#11 0x0054e526 in js_Execute (cx=0x1c32280, chain=0x204af98, script=0x1c43450, down=0x0, 
flags=0, result=0xbfffef10) at jsinterp.c:1407
#12 0x00512e20 in JS_ExecuteScript (cx=0x1c32280, obj=0x204af98, script=0x1c43450, 
rval=0xbfffef10) at jsapi.c:4009
#13 0x2b1d35a2 in mozJSComponentLoader::GlobalForLocation (this=0x1c314d0, 
aLocation=0x1c30f60 "rel:jsconsole-clhandler.js", aComponent=0x1c31600, aGlobal=0xbffff090) at 
mozJSComponentLoader.cpp:1567
#14 0x2b1d3803 in mozJSComponentLoader::ModuleForLocation (this=0x1c314d0, 
registryLocation=0x1c30f60 "rel:jsconsole-clhandler.js", component=0x1c31600, status=<incomplete 
type>) at mozJSComponentLoader.cpp:917
#15 0x2b1d42be in mozJSComponentLoader::AttemptRegistration (this=0x1c314d0, 
component=0x1c31600, deferred=0) at mozJSComponentLoader.cpp:755
#16 0x2b1d488b in mozJSComponentLoader::AutoRegisterComponent (this=0x1c314d0, when=0, 
component=0x1c31600, registered=0xbffff304) at mozJSComponentLoader.cpp:682
#17 0x2b1d11f2 in mozJSComponentLoader::RegisterComponentsInDir (this=0x1c314d0, when=0, 
dir=0x1c0e110) at mozJSComponentLoader.cpp:590
#18 0x2b1d128b in mozJSComponentLoader::AutoRegisterComponents (this=0x1c314d0, when=0, 
aDirectory=0x1c0e110) at mozJSComponentLoader.cpp:546
#19 0x01062178 in nsComponentManagerImpl::AutoRegisterNonNativeComponents (this=0x1c0c910, 
spec=0x1c0e110) at nsComponentManager.cpp:3262
#20 0x010647cf in nsComponentManagerImpl::AutoRegisterImpl (this=0x1c0c910, when=0, 
inDirSpec=0x0, fileIsCompDir=1) at nsComponentManager.cpp:3232
#21 0x01064882 in nsComponentManagerImpl::AutoRegister (this=0x1c0c910, aSpec=0x0) at 
nsComponentManager.cpp:3404
#22 0x01010d3c in NS_InitXPCOM3_P (result=0xbffff738, binDirectory=0x1c0c3a0, 
appFileLocationProvider=0xbffff768, staticComponents=0x54ab0, componentCount=1) at 
nsXPComInit.cpp:616
#23 0x00003433 in ScopedXPCOMStartup::Initialize (this=0xbffff738) at nsAppRunner.cpp:595
#24 0x00009259 in XRE_main (argc=1, argv=0xbffff8d4, aAppData=0x549e0) at nsAppRunner.cpp:
2141
#25 0x000024aa in main (argc=1, argv=0xbffff8d4) at nsBrowserApp.cpp:61
(gdb) 


Reproducible: Always

Steps to Reproduce:
1) Build Firefox with --disable-optimize, --enable-debug, and --disable-tests.
2) Run.
Actual Results:  
Firefox crashes.

Expected Results:  
Not crashing is often considered to be proper behavior.
(Reporter)

Updated

12 years ago
Version: unspecified → Trunk
*** Bug 313399 has been marked as a duplicate of this bug. ***

*** This bug has been marked as a duplicate of 312929 ***
Status: UNCONFIRMED → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → DUPLICATE
(Reporter)

Comment 3

12 years ago
This is not related to 312929.  It's crashing in a different part of the code.
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---

Comment 4

12 years ago
Unoptimized builds don't work anyway (gcc 4.0.0 5026).  Bug 305827.  This stack
is different, though, so I'll leave this alone.
(Reporter)

Comment 5

12 years ago
Oh.  Well, how hard can that be to fix? :)
(Assignee)

Updated

12 years ago
Assignee: nobody → joshmoz
(Reporter)

Comment 6

12 years ago
This (and everything else in Firefox) is really annoying to debug because I can't get a build with debug
information to launch.  When I build with --disable-optimize, --enable-debug, and --disable-tests,
Firefox quits on launch because em is NULL in XRE_main.  Obviously that isn't the case for a non-debug
build, since otherwise I wouldn't be able to launch Firefox at all.  But --disable-optimize doesn't give
me debug information; it just builds with -O0.  I wonder how anyone else gets this to work.  It must work,
or else nobody would debug Firefox.

Comment 7

12 years ago
I don't follow.  Or maybe I do.  --disable-optimize --enable-debug crashes, but --disable-optimize alone or with --disable-debug doesn't?

Hey, if it's only going to crash in some builds, I'd be glad it's at least crashing debugging is on!

If you want to turn off assertions and all of the other in-code (de)buggery, you want --enable-optimize="-O0 -g" --disable-debug.  Yeah, I know, it looks weird, but we don't actually do anything with --enable-optimize other than set C[XX]FLAGS.  Trust me, I'm a doctor.

Comment 8

12 years ago
P.S. Bug 305827 is fixed on the trunk now, so at least on ppc, we're able to launch unoptimized builds, with or without debugging, produced by gcc 4.
(Reporter)

Comment 9

12 years ago
Looks like --disable-optimize doesn't work.  I'm not sure what I was seeing earlier when I thought this was triggered by --enable-debug.  Anyway, thanks much for the --enable-optimize="-O1 -g" tip -- that's helped out a lot.  I suspect the problem with -O0 is that some functions which Firefox expects to be inlined aren't getting inlined, which really just means that judicious application of the always_inline attribute should probably happen.  Not that I'm volunteering. :)
(Reporter)

Comment 10

12 years ago
Created attachment 201363 [details] [diff] [review]
Fixes stack alignment problems with XPTC_InvokeByIndex

Previous patches to XPTC_InvokeByIndex didn't account for the variable amount of stack space used for the parameters added to the stack by invoke_copy_to_stack.  Since the amount of alignment required is dependent on the size of those parameters, the code could crash when the number of parameters wasn't 0.  The previous code also aligned the stack after the adjustment for the parameters invoke_copy_to_stack copies, but that was wrong because the subsequent calls need the copied parameters on top of the stack without an uninitialized stack alignment adjustment following them.

This patch does not align the stack for the call to invoke_copy_to_stack.  Since that's defined in the same module, calls to it shouldn't go through the dyld stub and therefore alignment shouldn't be necessary there.  Aligning that call would be a little annoying -- doable, but I don't feel like doing it right now, particularly because it doesn't appear to be necessary.

Comment 11

12 years ago
(In reply to comment #9)
> Looks like --disable-optimize doesn't work.  I'm not sure what I was seeing
> earlier when I thought this was triggered by --enable-debug.  Anyway, thanks
> much for the --enable-optimize="-O1 -g" tip -- that's helped out a lot.  I
> suspect the problem with -O0 is that some functions which Firefox expects to
> be inlined aren't getting inlined, which really just means that judicious
> application of the always_inline attribute should probably happen.  Not that
> I'm volunteering. :)

That's exactly what happened in bug 305827, now fixed on the trunk.  The fix hasn't made it to the branch (yet?), so if you're working with 1.8 (1.5), you'll want to apply my patch from that bug in your tree if you hope to get anything useful done.  If your --disable-optimize builds are crashing in QuickDraw, then 305827 is it for sure, and your inlining predicition is spot-on.  If you've got another crash that kinda sorta seems like mumble inlining mumble, gimme a new bug.  (Especially if it's reproducible on ppc.  No x86 Mac here.)  Thanks.
(Reporter)

Comment 12

12 years ago
Oops...I should've mentioned that I've already applied your patch. :)
(Reporter)

Comment 13

12 years ago
And what I'm seeing isn't actually a crash, but rather an assertion failure which causes the app to quit.  I don't know if it's reproducible on PowerPC -- I actually don't have any PPC Macs running Tiger around here.
(Assignee)

Updated

12 years ago
Attachment #201363 - Flags: review?(joshmoz)
(Assignee)

Comment 14

12 years ago
Comment on attachment 201363 [details] [diff] [review]
Fixes stack alignment problems with XPTC_InvokeByIndex

With this patch, I see this about 50% of the time I try to start Firefox on an Intel Mac:

Host Name:      biko
Date/Time:      2005-11-09 11:39:09.792 -0600
OS Version:     10.4.3 (Build 8F1111)
Report Version: 4

Command: firefox-bin
Path:    /Users/josh/src/ff_1.8_opt/mozilla/ff_1.8_opt_obj/dist/DeerPark.app/Contents/MacOS/firefox-bin
Parent:  WindowServer [329]

Version: 1.5 (1.5)

PID:    8900
Thread: Unknown

Link (dyld) error:

lazy pointer not found
Attachment #201363 - Flags: review?(joshmoz) → review-
(Reporter)

Comment 15

12 years ago
(In reply to comment #14)
> With this patch, I see this about 50% of the time I try to start Firefox on an
> Intel Mac:
> 
> Host Name:      biko
> Date/Time:      2005-11-09 11:39:09.792 -0600
> OS Version:     10.4.3 (Build 8F1111)
> Report Version: 4

> Link (dyld) error:
> 
> lazy pointer not found

That wouldn't be due to this patch, but it isn't good.  I haven't tried Firefox on that build yet, though.  Can you try building without this patch and seeing whether you can reproduce the problem there?  If so, write up a separate bug about it and I'll take a look.
(Assignee)

Comment 16

12 years ago
I can't reproduce it without your patch. I'll try another build though.
(Assignee)

Comment 17

12 years ago
Comment on attachment 201363 [details] [diff] [review]
Fixes stack alignment problems with XPTC_InvokeByIndex

A fresh build does not have that problem odd. r=josh
Attachment #201363 - Flags: superreview?(dbaron)
Attachment #201363 - Flags: review-
Attachment #201363 - Flags: review+
So, from reading the patch, it looks like what the old code did (ifdef XP_MACOSX) was (using % for modulus):

esp <- (((esp - n) - 4) % 16) + 4

and the new code is doing

esp <- (((esp - (n + 4)) % 16) + (n + 4)) - n

These two expressions look the same to me.  What's the difference, or did I read something wrong?
(Oops, I didn't actually mean modulus, I meant "A % B == A minus (A mod B)".)

Comment 20

12 years ago
The existing code always uses n = 4, the patch makes it use n = 4 + (paramCount << 3).
(Reporter)

Comment 21

12 years ago
Exactly.  It's confusing that inline asm argument references look the same as constants, but the initial sub %8, %%esp means that we're adjusting %esp by a variable amount (n, specifically).  We weren't accounting for that before and always adjusting by 4.  That works when n is 0, but not in other cases.  Now we adjust the stack by n+4 before aligning, then align, then move the stack back so after the other parts of the code are done playing with it it'll be aligned properly at the site of the call.
Those expressions above include the movement of the ifdef XP_MACOSX block from after the
  "subl  %8, %%esp\n\t" /* make room for params */
to before it.  As far as I can tell, the changes inside that block exactly cancel out the movement of that block to the other side of the line above.  I don't see how any of your explanations refute my argument:  they point to changes in the code, but I still don't see how the changes as a whole don't cancel each other out.  What's wrong with the mathematical expressions above?
So I'll restate the two expressions I have above, with an extra pair of parentheses around the outside, and annotate each pair of parentheses with which operation is directly inside those parentheses:

Old: esp <- ((((esp - n) - 4) % 16) + 4)
            DCBA
New: esp <- ((((esp - (n + 4)) % 16) + (n + 4)) - n)
            AGCF      E                E
The operations are:
A: "subl  %8, %%esp\n\t" /* make room for params */
B: "subl $0x4, %%esp\n\t"
C: "andl $0xfffffff0, %%esp\n\t" /* make sure stack is 16-byte aligned */
D: "addl $0x4, %%esp\n\t"
E: "g" (n + 4),			/* %9 */
F: "subl %9, %%esp\n\t"
G: "addl %9, %%esp\n\t"

Are my expressions wrong in some way?  Or are they not equivalent in some way?  In what case does the result end up different?

Comment 24

12 years ago
dbaron, I see what you're saying.  The current behavior is:

ESP -= n  } Identical to
ESP -= 4  } ESP -= (n+4)
ESP &= 0xfffffff0
ESP += 4

Augmented by this patch, the behavior is equivalent:

ESP -= (n+4)
ESP &= 0xfffffff0
ESP += (n+4)  } Identical to
ESP -= n      } ESP += 4

The existing code is cleaner and it's easier to see what's going on.  The cross-platform "subl %8, %%esp" serves the same function on all platforms.

Eric, can you confirm that you see this crash on a fresh unpatched build?
(Reporter)

Comment 25

12 years ago
I'll try to get back to this tonight and address everyone's comments here.
Comment on attachment 201363 [details] [diff] [review]
Fixes stack alignment problems with XPTC_InvokeByIndex

Marking sr- for lack of an explanation of what this changes; feel free to re-request if there is one.
Attachment #201363 - Flags: superreview?(dbaron) → superreview-
Component: General → XPCOM
Flags: review+
Product: Firefox → Core
(Reporter)

Comment 27

12 years ago
Sorry...real life is interfering.  I'll try to get back to this soon.
(Assignee)

Updated

11 years ago
Summary: Unoptimized debug build crashes with a misaligned stack on Intel → Unoptimized Intel Mac debug build crashes with a misaligned stack
(Assignee)

Comment 28

11 years ago
Comment on attachment 201363 [details] [diff] [review]
Fixes stack alignment problems with XPTC_InvokeByIndex

Doesn't work, see dbaron's comments.
Attachment #201363 - Flags: review-

Comment 29

11 years ago
Created attachment 208059 [details] [diff] [review]
Align stack for all calls

I'm pretty sure I know what's going on here.

There are two CALLs in XPTC_InvokeByIndex.  The asm ensures that the stack is 16-byte aligned for the second CALL, but the stack will always be offset by 8 bytes for the first CALL to invoke_copy_to_stack.  invoke_copy_to_stack is CALLed directly by IP offset, so the misaligned stack doesn't cause an immediate problem.  invoke_copy_to_stack does call IsPtrData through a stub, though, and this is where the alignment requirement is enforced.

This doesn't present a problem at -O1 or higher only because IsPtrData is inlined and no CALL is made inside invoke_copy_to_stack.  That procedure returns, our asm aligns the stack for the second call, and everything proceeds as it should.

We shouldn't assume that the stack doesn't need to be aligned for the first CALL.  We should adhere to the ABI and guarantee alignment any time we CALL.  This patch makes that change by aligning for the first CALL instead of the second, and manipulating ESP accordingly between CALLs so that the stack remains aligned throughout.

Comment 30

11 years ago
Created attachment 208060 [details] [diff] [review]
Align stack for all calls, and restore ESP more intelligently

This is the same as the previous patch, except it doesn't do any unnecessary ESP manipulation when it's done, it simply puts ESP back the way it found it.  This patch also includes clean-up to turn the XP_MACOSX macros into KEEP_STACK_16_BYTE_ALIGNED ones, and adds the appropriate #define to xptc_platforms_unixish_x86.h.

Comment 31

11 years ago
Created attachment 208082 [details] [diff] [review]
Align stack for all calls, and restore ESP more intelligently

I noticed that the previous patch wouldn't build on machines that don't require aligned stacks, because of overzealous #ifdefing.  This fixes that problem, and replaces the temp3 placeholder with saved_esp.  I also noticed that the parameter number for fn_copy (%0) was incorrect (%3).  I took ESP out of the clobber spec for OS X because save/restore is now handled directly in that asm block, so ESP is not clobbered.  The compiler didn't know what to do about ESP anyway (hence the requirement to save/restore).
Attachment #208060 - Attachment is obsolete: true
(Assignee)

Comment 32

11 years ago
Comment on attachment 208082 [details] [diff] [review]
Align stack for all calls, and restore ESP more intelligently

Unfortunately, this doesn't work.

"Link (dyld) error: lazy pointer not found"

Comment 33

11 years ago
Could you run it in the debugger and get the backtrace when dyld sends its SIGTRAP?  Thanks.

Comment 34

11 years ago
Oh, the args after "this" are in the wrong spot.

Comment 35

11 years ago
Created attachment 208121 [details] [diff] [review]
Align stack for all calls, set up parameters properly

I believe this version works for alignment in xptcinvoke.  It maintains the parameter area for the second CALL above the area for the first CALL; both are aligned, and invoke_copy_to_stack is pointed into the second CALL's parameter area for its destination parameter.

dbaron, I'd appreciate a review of this asm.

An -O0 build now crashes elsewhere.  I don't believe the new crash is related to xptcall.
Attachment #208059 - Attachment is obsolete: true
Attachment #208082 - Attachment is obsolete: true
Attachment #208121 - Flags: review?(dbaron)

Comment 36

11 years ago
The problems we're now seeing on the DTK are a race condition within dyld.  In conjunction with this patch, disabling hyperthreading results in a stable application.

See http://www.opensource.apple.com/darwinsource/10.4.4.ppc/dyld-44.2/src/dyld.cpp, bindLazySymbol.  ADC login required.  Note that this code is not equivalent to the shipping x86 code.  We don't yet know if shipping x86 systems are susceptible to the dyld problem.

The xptcall invoke stack alignment issue remains a very real problem.  In order to run on x86, we need to get this x86 patch in.  (Since we already did an alignment patch once over the summer, we'll call this patch a REalignment fix.) dbaron, could you please take a look?
Status: UNCONFIRMED → NEW
Ever confirmed: true

Updated

11 years ago
Blocks: 324651
(Assignee)

Comment 37

11 years ago
Comment on attachment 208121 [details] [diff] [review]
Align stack for all calls, set up parameters properly

Looks good to me, works fine. This should be tested on a system that does not define KEEP_STACK_16_BYTE_ALIGNED since this patch modifies the way things are done on such systems. I don't have a setup to do that at the moment.
Attachment #208121 - Flags: superreview?(dbaron)
Attachment #208121 - Flags: review?(joshmoz)
Attachment #208121 - Flags: review?(dbaron)
(Assignee)

Updated

11 years ago
Attachment #208121 - Flags: review?(joshmoz) → review+
(Assignee)

Comment 38

11 years ago
I can actually test this without the 16-byte alignment stuff. I'll just turn it off and run on PPC Mac OS X.
(Assignee)

Comment 39

11 years ago
Seems to work fine without KEEP_STACK_16_BYTE_ALIGNED being defined.

Comment 40

11 years ago
(In reply to comment #38)
> I can actually test this without the 16-byte alignment stuff. I'll just turn it
> off and run on PPC Mac OS X.

ppc doesn't use any of this x86 code, but the only changes on systems that don't use KEEP_STACK_16_BYTE_ALIGNED are just to get the parameter indices right.
(Assignee)

Comment 41

11 years ago
oh duh

Comment 42

11 years ago
blocking1.8.0.2: this is required for an x86 Mac release.
Flags: blocking1.8.0.2?

Comment 43

11 years ago
Checked in to CAMINO_1_0_BRANCH.

Still waiting on review, dbaron!
Whiteboard: [camino-1.0]
Flags: blocking1.8.0.2? → blocking1.8.0.2+

Comment 44

11 years ago
Comment on attachment 208121 [details] [diff] [review]
Align stack for all calls, set up parameters properly

dbaron says he won't have a chance to review.  shaver, if you decide to use your free pass too, I'm going to have to start picking superreviewers who haven't advertised their x86 asm prowess.
Attachment #208121 - Flags: superreview?(dbaron) → superreview?(shaver)
Comment on attachment 208121 [details] [diff] [review]
Align stack for all calls, set up parameters properly

Superreview isn't going to require deep knowledge of x86 assembly here, it's integration review; the review should be on the basis of the code analysis, and I see that josh has reviewed, so sr=shaver.
Attachment #208121 - Flags: superreview?(shaver) → superreview+
(Assignee)

Updated

11 years ago
Attachment #208121 - Flags: branch-1.8.1?(dougt)
Attachment #208121 - Flags: approval1.8.0.2?

Comment 46

11 years ago
Landed on trunk.  We need this on the branches.
Status: NEW → RESOLVED
Last Resolved: 12 years ago11 years ago
Resolution: --- → FIXED

Updated

11 years ago
Attachment #208121 - Flags: branch-1.8.1?(dougt) → branch-1.8.1?(shaver)

Comment 47

11 years ago
This fix appears to have broken BeOS builds.  See bug 326838 for details.
Depends on: 326838
Comment on attachment 208121 [details] [diff] [review]
Align stack for all calls, set up parameters properly

b181=shaver
Attachment #208121 - Flags: approval-branch-1.8.1?(shaver) → approval-branch-1.8.1+
Comment on attachment 208121 [details] [diff] [review]
Align stack for all calls, set up parameters properly

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #208121 - Flags: approval1.8.0.2? → approval1.8.0.2+

Comment 50

11 years ago
Awaiting review on bug 326838 before landing this on branches, so as to avoid busting BeOS on the branches.

Comment 51

11 years ago
Checked in on 1_8_0 and 1_8 branches.
Keywords: fixed1.8.0.2, fixed1.8.1
You need to log in before you can comment on or make changes to this bug.