Closed Bug 297326 Opened 19 years ago Closed 19 years ago

fix xptcall stack code to be 16-byte aligned

Categories

(Core :: XPConnect, defect)

PowerPC
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jaas, Assigned: jaas)

References

Details

Attachments

(2 files, 4 obsolete files)

Apple provided us with a patch to make our xptcall stack code 16-byte aligned.
This is necessary for x86-based Macs.
Attached patch fix v1.0 (obsolete) — Splinter Review
Attachment #185859 - Flags: superreview?(pinkerton)
Attachment #185859 - Flags: review?(sfraser_bugs)
Comment on attachment 185859 [details] [diff] [review]
fix v1.0

uh sure. :) r=pink.
Attachment #185859 - Flags: superreview?(sfraser_bugs)
Attachment #185859 - Flags: superreview?(pinkerton)
Attachment #185859 - Flags: review?(sfraser_bugs)
Attachment #185859 - Flags: review+
Someone who cares about the unixish x86 xpconnect code should review this.
Attachment #185859 - Flags: superreview?(sfraser_bugs)
Attachment #185859 - Flags: superreview?(caillon)
Attachment #185859 - Flags: review?(dbaron)
Comment on attachment 185859 [details] [diff] [review]
fix v1.0

So it seems to me that this patch is significantly more complicated than it
needs to be.  It's adding additional padding to the stack in *two different
ways*:	once by making |n| larger, and once by doing an additional |andl| and
|subl|.  Given the current approach, the making |n| larger is completely
unneeded.  (It could in fact be done only by making |n| larger, which would be
much simpler, but that would be extremely non-portable, and the number added
could require changes for changes in compiler version or compiler options (such
as optimization flags).)

It seems like part of the motivation for parts of the patch (and a good bit of
the existing assembly) is being able to use this with -fomit-frame-pointer
(does Mac use that by default?) -- although perhaps some of the %esp
restoration is needed for access to local stack variables as well.  However,
I'm skeptical that this code will actually work with -fomit-frame-pointer,
since it's not clear how the compiler would actually get to the |saved_esp|
variable on the stack without a frame pointer; that could easily be fixed by
using assembly instead of C to save the stack pointer: then the location will
be relative to the post-alignment-adjustment stack pointer instead of the
pre-alignment-adjustment one.

Also, even with the change to |n| removed, this patch still uses more space
than necessary.  The lines:
+    "andl $0xfffffff0, %%esp\n\t" // make sure stack is 16-byte aligned
+    "subl $0xc, %%esp\n\t" // compensate for the extra push we do down below
could be replaced with:
+    "subl $0x4, %%esp\n\t" // compensate for the extra push we do down below
+    "andl $0xfffffff0, %%esp\n\t" // make sure stack is 16-byte aligned
+    "addl $0x4, %%esp\n\t" // compensate for the extra push we do down below
although whether it's better to optimize for code size or stack usage is an
open question.	Given some of Mozilla's stack usage habits, my instinct is for
the latter.

Also, it might be better to stick to C comments rather than C++ comments inside
the assembly blocks (that's what all the current comments inside the assembly
blocks are).


I'm not sure if you can get the original patch author to respond to these
comments, but I hope you can.
Attachment #185859 - Flags: review?(dbaron) → review-
I'm the original patch author. I think that all of your concerns with the patch are valid, and I didn't expect 
you to take this patch unaltered.

Whatever you do that makes sense is fine with me, as long as it works. :) If you need me to test anything, 
let me know.
Attached patch fix v2.0 (untested) (obsolete) — Splinter Review
Any chance you could give this a spin?

(But perhaps it's worth trying the *_gcc_x86_unix* variants instead?)
Comment on attachment 185859 [details] [diff] [review]
fix v1.0

cancelling request for now (also note that i'm not a super reviewer)
Attachment #185859 - Flags: superreview?(caillon)
Attachment #185859 - Attachment is obsolete: true
I'm not the world's foremost expert in asm, but I like v2 much better. David
made it much simpler and more intuitive. I can't test it (for obvious reasons)
but it looks like it should work just fine.
With dbaron's patch:

c++ -o xptcinvoke_unixish_x86.o -c  -DOSTYPE=\"Darwin8.1.0\" -DOSARCH=\"Darwin\" -
DBUILD_ID=2005063021 -DEXPORT_XPTC_API   -I../../../../../../dist/include/xpcom -I../../../../../../
dist/include -I../../../../../../dist/include/nspr    -I../../../../../../dist/sdk/include -I/Users/josh/src/
ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/../..    -fPIC   -fno-rtti -fno-exceptions -Wall -
Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -
Wno-non-virtual-dtor -Wno-long-long -fpascal-strings -no-cpp-precomp -fno-common -fshort-
wchar -I/Developer/Headers/FlatCarbon -pipe  -DDEBUG -D_DEBUG -DDEBUG_josh -DTRACING -g -O   
-DMOZILLA_CLIENT -include ../../../../../../mozilla-config.h -Wp,-MD,.deps/
xptcinvoke_unixish_x86.pp /Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/
xptcinvoke_unixish_x86.cpp
/Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp: In 
function 'nsresult XPTC_InvokeByIndex(nsISupports*, PRUint32, PRUint32, nsXPTCVariant*)':
/Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:
142: error: invalid 'asm': operand number missing after %-letter
/Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:
142: error: invalid 'asm': operand number missing after %-letter
/Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:
142: error: invalid 'asm': operand number missing after %-letter
{standard input}:137:operands given don't match any known 386 instruction
make[8]: *** [xptcinvoke_unixish_x86.o] Error 1
make[7]: *** [libs] Error 2
make[6]: *** [libs] Error 2
make[5]: *** [libs] Error 2
make[4]: *** [libs] Error 2
make[3]: *** [libs] Error 2
make[2]: *** [tier_2] Error 2
make[1]: *** [default] Error 2
make: *** [build] Error 2
Attached patch fix v2.1 (untested) (obsolete) — Splinter Review
This fixes some typos in the previous patch.
Attachment #186164 - Attachment is obsolete: true
Attached file crash log #1 (obsolete) —
fix v2.1 compiles, but crashes
With access to the test x86 Mac, I can't actually get this to work due to
register allocation problems.  Being honest about what registers the assembly
modifies gives a register allocation error, not doing so means we clobber one of
the %* params.  (That said, I'm not sure why it can't leave those in memory a
little longer.  Nor do I really understand the changes in revision 1.8 for bug
16612, in particular the temp? variables.)
Attached patch fix v1.1Splinter Review
This is a cleaned up version of the original patch which works.  I'm marking it
r=me since it basically wasn't my patch.
Attachment #187965 - Attachment is obsolete: true
Attachment #187971 - Attachment is obsolete: true
Attachment #187985 - Flags: superreview?(shaver)
Attachment #187985 - Flags: review+
That said, I'm not sure how to tell that this really fixes whatever problem it
was supposed to fix, since the build seems to work OK without it, most of the
time at least.
Sorry that I've been relatively quiet on this bug; I've been busy at work. :)

If I remember correctly, the original problem I saw and the reason I made this patch was because browsing 
around on http://news.google.com/ triggered a crash because of a misaligned stack in a system drawing 
function.
With David's new patch I got this crash, which I am guessing is what Ronnie is talking about:

Exception:  EXC_BAD_INSTRUCTION (0x0002)
Code[0]:    0x0000000d
Code[1]:    0x00000000

Thread 0 Crashed:
0   com.apple.CoreGraphics   	0x92f57854 CGSColorMaskCopyARGB8888 + 576
1   com.apple.CoreGraphics   	0x93086f5b argb32_mark + 39005
2   libRIP.A.dylib           	0x916210d0 ripd_Mark + 300
3   libRIP.A.dylib           	0x9162753c ripl_BltGlyph + 2212
4   libRIP.A.dylib           	0x916268d4 ripc_DrawGlyphs + 2780

I'll try with Ronnie's patch.
Attachment #187985 - Flags: review-
With David's fix v1.1 the browser goes down within a minute every time. With
Ronnie's fix v1.0 I have been browsing for 15 minutes, trying to complicated
things, and not crashing. Perhaps we should go with Ronnie's fix? Any more ideas
David?
Yes, that is the same crash. If you look at the offending instruction, you'll see that it is an aligned SSE mov 
instruction, but the operands are not 16-byte aligned.
David - should I request review on the original Apple patch or do you want to
try again? Perhaps we could check the patch in with modifications wrapped by
"#ifdef XP_MACOSX"?
Comment on attachment 187985 [details] [diff] [review]
fix v1.1

No, somebody should debug this and figure out why the stack isn't aligned where
it should be (e.g., which way it's off).
Attachment #187985 - Flags: superreview?(shaver)
Folks,

So, again, I'm not really any kind of expert on this. In fact, I don't use macs
and I definitely do not have access to the x86 macs. I can't test any patches or
use debuggers... basically I'm flying blind. But here's what I've noticed:

v1.0 works but seems overly complicated for the reasons David mentioned. v1.1
makes more sense but crashes for some reason. Assuming that XP_MACOSX is
defined, these patches have almost identical paths of execution. That is, once
the preprocessor has done its work and chosen all the appropriate #ifdef blocks,
there are only two main differences between the two patches.

Difference #1:

The initial value assigned to the variable n is different. In v1.0 this is line
87, and in v1.1 this is line 84. They both use a bitshift, but v1.0 adds an
additional 15 bytes to the shifted value. After the "subl" instruction (line 102
in v1.0 and line 96 in v1.1), the "subl" instruction of v1.1 line 100, and the
"andl" instruction (see Difference #2 below) it becomes clear that v1.0 is
allocating more space for the parameters than v1.1. In fact, a pattern emerges
which varies based on whether or not the parameter count is odd or even:

  1  | bytes  |   bytes   |  bytes
param| needed | allocated | wasted
-----+--------+-----------+-------
v1.0 |   8    |     32    |   24
-----+--------+-----------+-------
v1.1 |   8    |     16    |    8  


  2  | bytes  |   bytes   |  bytes
param| needed | allocated | wasted
-----+--------+-----------+-------
v1.0 |   16   |     32    |   16
-----+--------+-----------+-------
v1.1 |   16   |     32    |   16


  3  | bytes  |   bytes   |  bytes
param| needed | allocated | wasted
-----+--------+-----------+-------
v1.0 |   24   |     48    |   24
-----+--------+-----------+-------
v1.1 |   24   |     32    |    8


  4  | bytes  |   bytes   |  bytes
param| needed | allocated | wasted
-----+--------+-----------+-------
v1.0 |   32   |     48    |   16
-----+--------+-----------+-------
v1.1 |   32   |     48    |   16

It seems pretty obvious that v1.1 is using the stack more efficiently than v1.0.
In fact, I think that we could do an even better job if we get rid of the "subl
$0x4, %%esp" instruction on line 100 of v1.1. Which brings us to:


Difference #2) The stack pointer manipulations surrounding the "andl
$0xfffffff0, %%esp" instruction are different. This instruction is line 104 in
v1.0 and line 101 in v1.1. In v1.0, "andl" is executed and then the stack
pointer is adjusted to compensate for a later push via "subl $0xc, %%esp". This
makes sense to me (in fact, could we get away with a $0x8 instead?). v1.1 does a
different thing. It does a "subl $0x4" followed by the "andl" followed by an
"addl $0x4". I'm pretty sure that this is the spot where v1.1 is crashing. This
series of three instructions does not always make room ("compensate for the
push") on the stack like it is supposed to. Take the following example:

line#|       instruction       |   %%esp
-----+-------------------------+------------
<100 | (initial example value) | 0x00012fd4
 100 | subl $0x4, %%esp        | 0x00012fd0
 101 | andl $0xfffffff0, %%esp | 0x00012fd0
 102 | addl $0x4, %%esp        | 0x00012fd4

In this case no room is made on the stack at all. You start and end at the same
address. This series DOES work in some other cases:

line#|       instruction       |   %%esp
-----+-------------------------+------------
<100 | (initial example value) | 0x00012fd0
 100 | subl $0x4, %%esp        | 0x00012fcc
 101 | andl $0xfffffff0, %%esp | 0x00012fc0
 102 | addl $0x4, %%esp        | 0x00012fc4

So I'm betting that this is why v1.1 is causing a crash. v1.0 uses a simple
"subl" after the "andl" which will always make extra room on the stack. v1.1
sometimes fails to make any room at all..

Sorry for such a long post... I guess I'm too much of an academic (when it comes
to asm) to say anything quickly...

Ben
See my previous post. Who knows if this puppy will work...
(In reply to comment #21)
> It seems pretty obvious that v1.1 is using the stack more efficiently than v1.0.
> In fact, I think that we could do an even better job if we get rid of the "subl
> $0x4, %%esp" instruction on line 100 of v1.1. Which brings us to:

Well, the goal is not to allocate any more space than needed for alignment. 
(The existing code isn't all that efficient, in assuming all parameters need two
words, which is actually unusual, but anyway...)  I think I'll explain the
reason for your confusion below.

> v1.1 does a
> different thing. It does a "subl $0x4" followed by the "andl" followed by an
> "addl $0x4". I'm pretty sure that this is the spot where v1.1 is crashing. This
> series of three instructions does not always make room ("compensate for the
> push") on the stack like it is supposed to. Take the following example:
[...]
> In this case no room is made on the stack at all. You start and end at the same
> address.

There's nothing wrong with that.  All it's supposed to do is ensure that *after*
the three pushes have been done, the stack pointer is 16-byte aligned.  And
that's all this bug is about.  The idea is that, to get to 16-byte alignment, we
need to subtract either 0, 4, 8, or 12 bytes (leaving that amount of empty room).

> So I'm betting that this is why v1.1 is causing a crash. v1.0 uses a simple
> "subl" after the "andl" which will always make extra room on the stack. v1.1
> sometimes fails to make any room at all..

No, the crash is pretty clearly because of unmet alignment requirements.  If it
weren't making enough room on the stack, it would crash much closer to this
code.  So there's something wrong, but I don't think your analysis helps to find it.
(In reply to comment #23)
> the three pushes have been done, the stack pointer is 16-byte aligned.  And

Sorry, one, not three.
Comment on attachment 187985 [details] [diff] [review]
fix v1.1

Turns out fix v1.1 seems to work. Requesting approval for landing...
Attachment #187985 - Flags: review-
Attachment #187985 - Flags: review+
Attachment #187985 - Flags: approval1.8b4?
Attachment #187985 - Flags: approval1.8b4? → approval1.8b4+
landed
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: