Closed
Bug 297326
Opened 19 years ago
Closed 19 years ago
fix xptcall stack code to be 16-byte aligned
Categories
(Core :: XPConnect, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jaas, Assigned: jaas)
References
Details
Attachments
(2 files, 4 obsolete files)
1.74 KB,
patch
|
dbaron
:
review+
jaas
:
review+
asa
:
approval1.8b4+
|
Details | Diff | Splinter Review |
1.30 KB,
patch
|
Details | Diff | Splinter Review |
Apple provided us with a patch to make our xptcall stack code 16-byte aligned. This is necessary for x86-based Macs.
Attachment #185859 -
Flags: superreview?(pinkerton)
Attachment #185859 -
Flags: review?(sfraser_bugs)
Comment 2•19 years ago
|
||
Comment on attachment 185859 [details] [diff] [review] fix v1.0 uh sure. :) r=pink.
Attachment #185859 -
Flags: superreview?(sfraser_bugs)
Attachment #185859 -
Flags: superreview?(pinkerton)
Attachment #185859 -
Flags: review?(sfraser_bugs)
Attachment #185859 -
Flags: review+
Comment 3•19 years ago
|
||
Someone who cares about the unixish x86 xpconnect code should review this.
Attachment #185859 -
Flags: superreview?(sfraser_bugs)
Attachment #185859 -
Flags: superreview?(caillon)
Attachment #185859 -
Flags: review?(dbaron)
Comment on attachment 185859 [details] [diff] [review] fix v1.0 So it seems to me that this patch is significantly more complicated than it needs to be. It's adding additional padding to the stack in *two different ways*: once by making |n| larger, and once by doing an additional |andl| and |subl|. Given the current approach, the making |n| larger is completely unneeded. (It could in fact be done only by making |n| larger, which would be much simpler, but that would be extremely non-portable, and the number added could require changes for changes in compiler version or compiler options (such as optimization flags).) It seems like part of the motivation for parts of the patch (and a good bit of the existing assembly) is being able to use this with -fomit-frame-pointer (does Mac use that by default?) -- although perhaps some of the %esp restoration is needed for access to local stack variables as well. However, I'm skeptical that this code will actually work with -fomit-frame-pointer, since it's not clear how the compiler would actually get to the |saved_esp| variable on the stack without a frame pointer; that could easily be fixed by using assembly instead of C to save the stack pointer: then the location will be relative to the post-alignment-adjustment stack pointer instead of the pre-alignment-adjustment one. Also, even with the change to |n| removed, this patch still uses more space than necessary. The lines: + "andl $0xfffffff0, %%esp\n\t" // make sure stack is 16-byte aligned + "subl $0xc, %%esp\n\t" // compensate for the extra push we do down below could be replaced with: + "subl $0x4, %%esp\n\t" // compensate for the extra push we do down below + "andl $0xfffffff0, %%esp\n\t" // make sure stack is 16-byte aligned + "addl $0x4, %%esp\n\t" // compensate for the extra push we do down below although whether it's better to optimize for code size or stack usage is an open question. Given some of Mozilla's stack usage habits, my instinct is for the latter. Also, it might be better to stick to C comments rather than C++ comments inside the assembly blocks (that's what all the current comments inside the assembly blocks are). I'm not sure if you can get the original patch author to respond to these comments, but I hope you can.
Attachment #185859 -
Flags: review?(dbaron) → review-
Comment 5•19 years ago
|
||
I'm the original patch author. I think that all of your concerns with the patch are valid, and I didn't expect you to take this patch unaltered. Whatever you do that makes sense is fine with me, as long as it works. :) If you need me to test anything, let me know.
Any chance you could give this a spin? (But perhaps it's worth trying the *_gcc_x86_unix* variants instead?)
Comment 7•19 years ago
|
||
Comment on attachment 185859 [details] [diff] [review] fix v1.0 cancelling request for now (also note that i'm not a super reviewer)
Attachment #185859 -
Flags: superreview?(caillon)
Attachment #185859 -
Attachment is obsolete: true
I'm not the world's foremost expert in asm, but I like v2 much better. David made it much simpler and more intuitive. I can't test it (for obvious reasons) but it looks like it should work just fine.
With dbaron's patch: c++ -o xptcinvoke_unixish_x86.o -c -DOSTYPE=\"Darwin8.1.0\" -DOSARCH=\"Darwin\" - DBUILD_ID=2005063021 -DEXPORT_XPTC_API -I../../../../../../dist/include/xpcom -I../../../../../../ dist/include -I../../../../../../dist/include/nspr -I../../../../../../dist/sdk/include -I/Users/josh/src/ ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/../.. -fPIC -fno-rtti -fno-exceptions -Wall - Wconversion -Wpointer-arith -Wcast-align -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy - Wno-non-virtual-dtor -Wno-long-long -fpascal-strings -no-cpp-precomp -fno-common -fshort- wchar -I/Developer/Headers/FlatCarbon -pipe -DDEBUG -D_DEBUG -DDEBUG_josh -DTRACING -g -O -DMOZILLA_CLIENT -include ../../../../../../mozilla-config.h -Wp,-MD,.deps/ xptcinvoke_unixish_x86.pp /Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/ xptcinvoke_unixish_x86.cpp /Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp: In function 'nsresult XPTC_InvokeByIndex(nsISupports*, PRUint32, PRUint32, nsXPTCVariant*)': /Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp: 142: error: invalid 'asm': operand number missing after %-letter /Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp: 142: error: invalid 'asm': operand number missing after %-letter /Users/josh/src/ff_debug/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp: 142: error: invalid 'asm': operand number missing after %-letter {standard input}:137:operands given don't match any known 386 instruction make[8]: *** [xptcinvoke_unixish_x86.o] Error 1 make[7]: *** [libs] Error 2 make[6]: *** [libs] Error 2 make[5]: *** [libs] Error 2 make[4]: *** [libs] Error 2 make[3]: *** [libs] Error 2 make[2]: *** [tier_2] Error 2 make[1]: *** [default] Error 2 make: *** [build] Error 2
This fixes some typos in the previous patch.
Attachment #186164 -
Attachment is obsolete: true
Assignee | ||
Comment 11•19 years ago
|
||
fix v2.1 compiles, but crashes
With access to the test x86 Mac, I can't actually get this to work due to register allocation problems. Being honest about what registers the assembly modifies gives a register allocation error, not doing so means we clobber one of the %* params. (That said, I'm not sure why it can't leave those in memory a little longer. Nor do I really understand the changes in revision 1.8 for bug 16612, in particular the temp? variables.)
This is a cleaned up version of the original patch which works. I'm marking it r=me since it basically wasn't my patch.
Attachment #187965 -
Attachment is obsolete: true
Attachment #187971 -
Attachment is obsolete: true
Attachment #187985 -
Flags: superreview?(shaver)
Attachment #187985 -
Flags: review+
That said, I'm not sure how to tell that this really fixes whatever problem it was supposed to fix, since the build seems to work OK without it, most of the time at least.
Comment 15•19 years ago
|
||
Sorry that I've been relatively quiet on this bug; I've been busy at work. :) If I remember correctly, the original problem I saw and the reason I made this patch was because browsing around on http://news.google.com/ triggered a crash because of a misaligned stack in a system drawing function.
Assignee | ||
Comment 16•19 years ago
|
||
With David's new patch I got this crash, which I am guessing is what Ronnie is talking about: Exception: EXC_BAD_INSTRUCTION (0x0002) Code[0]: 0x0000000d Code[1]: 0x00000000 Thread 0 Crashed: 0 com.apple.CoreGraphics 0x92f57854 CGSColorMaskCopyARGB8888 + 576 1 com.apple.CoreGraphics 0x93086f5b argb32_mark + 39005 2 libRIP.A.dylib 0x916210d0 ripd_Mark + 300 3 libRIP.A.dylib 0x9162753c ripl_BltGlyph + 2212 4 libRIP.A.dylib 0x916268d4 ripc_DrawGlyphs + 2780 I'll try with Ronnie's patch.
Attachment #187985 -
Flags: review-
Assignee | ||
Comment 17•19 years ago
|
||
With David's fix v1.1 the browser goes down within a minute every time. With Ronnie's fix v1.0 I have been browsing for 15 minutes, trying to complicated things, and not crashing. Perhaps we should go with Ronnie's fix? Any more ideas David?
Comment 18•19 years ago
|
||
Yes, that is the same crash. If you look at the offending instruction, you'll see that it is an aligned SSE mov instruction, but the operands are not 16-byte aligned.
Assignee | ||
Comment 19•19 years ago
|
||
David - should I request review on the original Apple patch or do you want to try again? Perhaps we could check the patch in with modifications wrapped by "#ifdef XP_MACOSX"?
Comment on attachment 187985 [details] [diff] [review] fix v1.1 No, somebody should debug this and figure out why the stack isn't aligned where it should be (e.g., which way it's off).
Attachment #187985 -
Flags: superreview?(shaver)
Folks, So, again, I'm not really any kind of expert on this. In fact, I don't use macs and I definitely do not have access to the x86 macs. I can't test any patches or use debuggers... basically I'm flying blind. But here's what I've noticed: v1.0 works but seems overly complicated for the reasons David mentioned. v1.1 makes more sense but crashes for some reason. Assuming that XP_MACOSX is defined, these patches have almost identical paths of execution. That is, once the preprocessor has done its work and chosen all the appropriate #ifdef blocks, there are only two main differences between the two patches. Difference #1: The initial value assigned to the variable n is different. In v1.0 this is line 87, and in v1.1 this is line 84. They both use a bitshift, but v1.0 adds an additional 15 bytes to the shifted value. After the "subl" instruction (line 102 in v1.0 and line 96 in v1.1), the "subl" instruction of v1.1 line 100, and the "andl" instruction (see Difference #2 below) it becomes clear that v1.0 is allocating more space for the parameters than v1.1. In fact, a pattern emerges which varies based on whether or not the parameter count is odd or even: 1 | bytes | bytes | bytes param| needed | allocated | wasted -----+--------+-----------+------- v1.0 | 8 | 32 | 24 -----+--------+-----------+------- v1.1 | 8 | 16 | 8 2 | bytes | bytes | bytes param| needed | allocated | wasted -----+--------+-----------+------- v1.0 | 16 | 32 | 16 -----+--------+-----------+------- v1.1 | 16 | 32 | 16 3 | bytes | bytes | bytes param| needed | allocated | wasted -----+--------+-----------+------- v1.0 | 24 | 48 | 24 -----+--------+-----------+------- v1.1 | 24 | 32 | 8 4 | bytes | bytes | bytes param| needed | allocated | wasted -----+--------+-----------+------- v1.0 | 32 | 48 | 16 -----+--------+-----------+------- v1.1 | 32 | 48 | 16 It seems pretty obvious that v1.1 is using the stack more efficiently than v1.0. In fact, I think that we could do an even better job if we get rid of the "subl $0x4, %%esp" instruction on line 100 of v1.1. Which brings us to: Difference #2) The stack pointer manipulations surrounding the "andl $0xfffffff0, %%esp" instruction are different. This instruction is line 104 in v1.0 and line 101 in v1.1. In v1.0, "andl" is executed and then the stack pointer is adjusted to compensate for a later push via "subl $0xc, %%esp". This makes sense to me (in fact, could we get away with a $0x8 instead?). v1.1 does a different thing. It does a "subl $0x4" followed by the "andl" followed by an "addl $0x4". I'm pretty sure that this is the spot where v1.1 is crashing. This series of three instructions does not always make room ("compensate for the push") on the stack like it is supposed to. Take the following example: line#| instruction | %%esp -----+-------------------------+------------ <100 | (initial example value) | 0x00012fd4 100 | subl $0x4, %%esp | 0x00012fd0 101 | andl $0xfffffff0, %%esp | 0x00012fd0 102 | addl $0x4, %%esp | 0x00012fd4 In this case no room is made on the stack at all. You start and end at the same address. This series DOES work in some other cases: line#| instruction | %%esp -----+-------------------------+------------ <100 | (initial example value) | 0x00012fd0 100 | subl $0x4, %%esp | 0x00012fcc 101 | andl $0xfffffff0, %%esp | 0x00012fc0 102 | addl $0x4, %%esp | 0x00012fc4 So I'm betting that this is why v1.1 is causing a crash. v1.0 uses a simple "subl" after the "andl" which will always make extra room on the stack. v1.1 sometimes fails to make any room at all.. Sorry for such a long post... I guess I'm too much of an academic (when it comes to asm) to say anything quickly... Ben
See my previous post. Who knows if this puppy will work...
(In reply to comment #21) > It seems pretty obvious that v1.1 is using the stack more efficiently than v1.0. > In fact, I think that we could do an even better job if we get rid of the "subl > $0x4, %%esp" instruction on line 100 of v1.1. Which brings us to: Well, the goal is not to allocate any more space than needed for alignment. (The existing code isn't all that efficient, in assuming all parameters need two words, which is actually unusual, but anyway...) I think I'll explain the reason for your confusion below. > v1.1 does a > different thing. It does a "subl $0x4" followed by the "andl" followed by an > "addl $0x4". I'm pretty sure that this is the spot where v1.1 is crashing. This > series of three instructions does not always make room ("compensate for the > push") on the stack like it is supposed to. Take the following example: [...] > In this case no room is made on the stack at all. You start and end at the same > address. There's nothing wrong with that. All it's supposed to do is ensure that *after* the three pushes have been done, the stack pointer is 16-byte aligned. And that's all this bug is about. The idea is that, to get to 16-byte alignment, we need to subtract either 0, 4, 8, or 12 bytes (leaving that amount of empty room). > So I'm betting that this is why v1.1 is causing a crash. v1.0 uses a simple > "subl" after the "andl" which will always make extra room on the stack. v1.1 > sometimes fails to make any room at all.. No, the crash is pretty clearly because of unmet alignment requirements. If it weren't making enough room on the stack, it would crash much closer to this code. So there's something wrong, but I don't think your analysis helps to find it.
(In reply to comment #23) > the three pushes have been done, the stack pointer is 16-byte aligned. And Sorry, one, not three.
Assignee | ||
Comment 25•19 years ago
|
||
Comment on attachment 187985 [details] [diff] [review] fix v1.1 Turns out fix v1.1 seems to work. Requesting approval for landing...
Attachment #187985 -
Flags: review-
Attachment #187985 -
Flags: review+
Attachment #187985 -
Flags: approval1.8b4?
Updated•19 years ago
|
Attachment #187985 -
Flags: approval1.8b4? → approval1.8b4+
Assignee | ||
Comment 26•19 years ago
|
||
landed
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•