Closed Bug 410509 Opened 12 years ago Closed 12 years ago

XPTC_InvokeByIndex() ASM mis-aligns stack, causing crash

Categories

(Core :: XPCOM, defect, critical)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: swsnyder, Assigned: vlad)

References

Details

(Keywords: crash, fixed1.9.1)

Attachments

(1 file)

The ASM code in XPTC_InvokeByIndex() juggles the x86 stack pointer, but does not necessarily follow the stack pointer alignment in use by the compiler.  This can - and does - lead to protection faults in situations where specific stack pointer alignment is required.

This patch adjusts the stack pointer such that it is set to 128-bit alignment for the child function called.  This patch applies cleanly to the 1.8.1.11 and trunk source trees.

Description of rationale to follow.
When building for Linux with GCC v4.1.2 I used the "-ftree-vectorize" compiler option and experienced occasional protection faults.

This option, new to GCC v4.0, implies the specification of stack alignment to 128-bits.  In past one could specify alignment and if it wasn't followed the only cost would be a small performance loss.  With -ftree-vectorize code is generated that *requires* the specified alignment.  The compiler assumes that alignment is being maintained and generates code that is illegal in the absence of this alignment.

I found that I occasionally experienced protection faults in nsCRT::strtok() when using this new compiler option.  Here's where it crashed:

http://mxr.mozilla.org/seamonkey/source/xpcom/ds/nsCRT.cpp#80

    0x00278ee9 <_ZN5nsCRT6strtokEPcPKcPS0_+1>:      push   %edi
    0x00278eea <_ZN5nsCRT6strtokEPcPKcPS0_+2>:      push   %esi
    0x00278eeb <_ZN5nsCRT6strtokEPcPKcPS0_+3>:      sub    $0x20,%esp
    0x00278eee <_ZN5nsCRT6strtokEPcPKcPS0_+6>:      mov    0x30(%esp),%ebp
    0x00278ef2 <_ZN5nsCRT6strtokEPcPKcPS0_+10>:     mov    0x34(%esp),%eax
    0x00278ef6 <_ZN5nsCRT6strtokEPcPKcPS0_+14>:     pxor   %xmm0,%xmm0
--> 0x00278efa <_ZN5nsCRT6strtokEPcPKcPS0_+18>:     movdqa %xmm0,(%esp)
    0x00278eff <_ZN5nsCRT6strtokEPcPKcPS0_+23>:     movdqa %xmm0,0x10(%esp)
    0x00278f05 <_ZN5nsCRT6strtokEPcPKcPS0_+29>:     movzbl (%eax),%edx
    0x00278f08 <_ZN5nsCRT6strtokEPcPKcPS0_+32>:     test   %dl,%dl

The source code uses a for() loop to initialize a 32-byte local buffer to zeroes.  What the compiler does is zeroes a 16-byte XMM register and attempts to write its value twice to the buffer addresses.  Clever - except that the stack pointer has previously been mis-aligned and now those movdqa instructions are given illegal addresses.

I traced the mis-alignment back to the ASM code in XPTC_InvokeByIndex().  Up to this code the stack had been maintained with 128-bit alignment as implied by -ftree-vectorize.  Now the stack pointer was 32-bit aligned.  The adjusted stack was no longer at the expected alignedment at the call to it's child function.  Nested ~10 layers down was the call to strtok() in which the alignment was assumed.  BOOM!

I don't have access to a version of GCC more recent than 4.1.2, but it seems from reading online that v4.2.x enables -ftree-vectorize as a default optimization.
Component: XP Apps → XPCOM
Product: Mozilla Application Suite → Core
QA Contact: xpcom
Version: SeaMonkey 1.1 Branch → Trunk
xref bug 218629 which has an unreviewed patch
Severity: normal → critical
Keywords: crash
i don't suppose gcc has any way to *tell* us what alignment it's using?
(In reply to comment #3)
> i don't suppose gcc has any way to *tell* us what alignment it's using?

Not as far as I can tell from the doc (I haven't looked at the GCC code).  There's an __alignof__ whose syntax is like "sizeof", but that appears to be of use only for a given variable.

gcc -march=pentium4 -O2 -ftree-vectorize -S --verbose-asm -o main.s main.c

	.file	"main.c"
# GNU C version 4.1.2 20070925 (Red Hat 4.1.2-27) (i386-redhat-linux)
#	compiled by GNU C version 4.1.2 20070925 (Red Hat 4.1.2-27).
# GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129298
# options passed:  -march=pentium4 -auxbase-strip -O2 -ftree-vectorize
# -fverbose-asm
# options enabled:  -falign-loops -fargument-alias -fbranch-count-reg
# -fcaller-saves -fcommon -fcprop-registers -fcrossjumping
# -fcse-follow-jumps -fcse-skip-blocks -fdefer-pop
# -fdelete-null-pointer-checks -fearly-inlining
# -feliminate-unused-debug-types -fexpensive-optimizations -ffunction-cse
# -fgcse -fgcse-lm -fguess-branch-probability -fident -fif-conversion
# -fif-conversion2 -finline-functions-called-once -fipa-pure-const
# -fipa-reference -fipa-type-escape -fivopts -fkeep-static-consts
# -fleading-underscore -floop-optimize -floop-optimize2 -fmath-errno
# -fmerge-constants -foptimize-register-move -foptimize-sibling-calls
# -fpcc-struct-return -fpeephole -fpeephole2 -fregmove -freorder-blocks
# -freorder-functions -frerun-cse-after-loop -frerun-loop-opt
# -fsched-interblock -fsched-spec -fsched-stalled-insns-dep -fshow-column
# -fsplit-ivs-in-unroller -fstrength-reduce -fstrict-aliasing
# -fthread-jumps -ftrapping-math -ftree-ccp -ftree-ch -ftree-copy-prop
# -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-fre
# -ftree-loop-im -ftree-loop-ivcanon -ftree-loop-optimize -ftree-lrs
# -ftree-pre -ftree-salias -ftree-sink -ftree-sra -ftree-store-ccp
# -ftree-store-copy-prop -ftree-ter -ftree-vect-loop-version
# -ftree-vectorize -ftree-vrp -funit-at-a-time -fverbose-asm
# -fzero-initialized-in-bss -m32 -m80387 -m96bit-long-double
# -maccumulate-outgoing-args -malign-stringops -mfancy-math-387
# -mfp-ret-in-387 -mieee-fp -mmmx -mno-red-zone -mpush-args -msse -msse2
# -mtls-direct-seg-refs

# Compiler executable checksum: 53c9f9515bf7a4d930482540205071f7

	.text
.globl main
	.type	main, @function
main:
	leal	4(%esp), %ecx	#,
	andl	$-16, %esp	#,
	pushl	-4(%ecx)	#
	pushl	%ebp	#
	movl	%esp, %ebp	#,
	pushl	%ecx	#
	subl	$4, %esp	#,
	movl	$0, (%esp)	#,
	call	somefunc	#
	addl	$4, %esp	#,
	popl	%ecx	#
	popl	%ebp	#
	leal	-4(%ecx), %esp	#,
	ret
	.size	main, .-main
	.ident	"GCC: (GNU) 4.1.2 20070925 (Red Hat 4.1.2-27)"
	.section	.note.GNU-stack,"",@progbits
is it possible for us to ask gcc for one?

Is this essentially a different ABI? if someone tries to mix normal objects that don't use -ftree-vectorize with objects that do, does this work?

(sorry, maybe i'm a bit out of it, what does main.c look like?)
The GCC doc says that stack alignment greater than 32-bits is not recommended for code that will be called by external processes (e.g. libraries).  The reason is that same thing we see here, that the callee is making assumptions that the caller may not be aware of.

This is the main.c used to generate the asm above:

extern int somefunc(int param);

int main(int argc, char * argv[])
{
  return ( somefunc(0) );
}
i don't suppose there's some way to tell gcc what alignment functions should use?

can you produce a testcase:
main.c built using xpcom 
components/foopy-xpcom.so

where main.c does: InitXPCOM2, RegisterComponents, foopy=do_createInstance() and then calls a method in foopy which has this alignment problem? after that, it's probably best to try with a non xpcom flavor.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 looks moderately interesting.

mmoy: i don't suppose you know someone who would be able to give us a hand. this is really beyond me.
I don't have contacts in GCC-land.

It looks like -ftree-vectorize will be turned on with -O3 in a future release according to http://gnu.miscellaneousmirror.org/software/gcc/projects/tree-ssa/vectorization.html

Would it be worthwhile to have an XPTC alignment option such as XPTC_ALIGN_128 that you could specify to tell XPTC to align the stack on 128 bits?
my gut response is: only if there's some way to find out that we should use it. can someone suggest a way to know that?

one of the docs suggests daniel berlin knows something about alignment...
Mozilla calls lower level libraries (either directly, or through plugins, or by being embedded) that are compiled with standard gcc flags.  Current versions of gcc do not produce code that can handle local variables used in SSE intrinsics in combination with an unaligned stack.  GCC-4.2 has a compile flag as well as __attribute__((force_align_arg_pointer)) that will cause gcc to produce code that realigns the stack when necessary.

It appears that most developers currently assume that the stack is 128-bit aligned, which causes bugs when mozilla unaligns it.  I stongly recommend fixing this.  (I am unable to comment whether the patch actually fixes the problem.)

Relevant bugs:

  https://bugs.freedesktop.org/show_bug.cgi?id=11145
  https://bugs.freedesktop.org/show_bug.cgi?id=15693


+/* Align to maximum x86 data size: 128 bits == 16 bytes == XMM register size.

that should be MMX, no?
(In reply to comment #11)
> +/* Align to maximum x86 data size: 128 bits == 16 bytes == XMM register size.
> 
> that should be MMX, no?

The 64-bit MMX registers are a subset of the 128-bit XMM regs.
Intel is planning on extending the SSE registers to 256 bits with their AVX instruction set in about two years.

MMX registers are mm0-mm7
SSE registers are xmm0-xmm7 on 32-bit platforms and xmm0-xmm15 on 64-bit platforms.
If we could label these as SSE registers, I think that'd be less confusing to me today. I don't talk about registers as AL/AX/EAX sized, i.e., we don't describe registers by their names, we describe them by their feature set/purpose (8bit, 16bit, 32bit).

mmoy: am i to assume that this means a buggy gcc in the future will break us again when it expects 256bit alignment which we again won't have?

this is really a game we shouldn't have to play.
(In reply to comment #14)
> If we could label these as SSE registers, I think that'd be less confusing to
> me today. I don't talk about registers as AL/AX/EAX sized, i.e., we don't
> describe registers by their names, we describe them by their feature
> set/purpose (8bit, 16bit, 32bit).

I don't have strong feelings about how the comments are worded.  I should point out, though, that just writing "SSE" could be seen as limiting given that the XMM regs are used with the SSE, SSE2, SSE3, SSSE3, SSE4, and SSE4.1 instruction sets.  (No doubt with additional tortured acromyns to be issued by Intel in the future.)
Duplicate of this bug: 446317
This is blocking us from using SSE2 in pixman; we should get this fixed one way or another for 1.9.1.
Flags: blocking1.9.1?
libpixman appears have hand-coded MMX and maybe SSE optimized code that's enabled for GCC on x86. I noticed that this is turned off for Windows. Should we try to turn this on?

# Build MMX code either with VC or with gcc-on-x86
ifdef _MSC_VER
#USE_MMX=1
#MMX_CFLAGS=
endif

ifdef GNU_CC
ifeq (86,$(findstring 86,$(OS_TEST)))
USE_MMX=1
MMX_CFLAGS=-mmmx -msse -Winline
ifneq ($(MOZ_WIDGET_TOOLKIT),os2)
MMX_CFLAGS+=--param inline-unit-growth=10000 --param large-function-growth=10000
endif
endif
endif

Also, why are we using ftree-vectorize or are we getting it by default? Won't this break on processors that don't support the vectorization level? Also, what is the default vectorization level?
> libpixman appears have hand-coded MMX and maybe SSE optimized code that's
> enabled for GCC on x86. I noticed that this is turned off for Windows. Should
> we try to turn this on?
>
> # Build MMX code either with VC or with gcc-on-x86
> ifdef _MSC_VER
> #USE_MMX=1
> #MMX_CFLAGS=
> endif
>
> ifdef GNU_CC
> ifeq (86,$(findstring 86,$(OS_TEST)))
> USE_MMX=1
> MMX_CFLAGS=-mmmx -msse -Winline
> ifneq ($(MOZ_WIDGET_TOOLKIT),os2)
> MMX_CFLAGS+=--param inline-unit-growth=10000 --param
> large-function-growth=10000
> endif
> endif
> endif
>
> Also, why are we using ftree-vectorize or are we getting it by default? Won't
> this break on processors that don't support the vectorization level? Also, what
> is the default vectorization level?

-ftree-vectorize will generate code that works on whatever target you
tell it to compile for (IE it can expand vector operations to be
performed using non-vector operations if your target architecture does
not support the vector operation directly).

In gcc 4.3+,tree-vectorize is on by default at O3 or higher.
In early gcc you need to pass a flag.
What do you mean "default vectorization level"?

As for the original bug, you can use something like force_align_arg_pointer to force alignment without modifying the ABI of the function (IE so you can call this code from something that only keeps 4 byte stack alignment or whatever, and still have it work right).
> In gcc 4.3+,tree-vectorize is on by default at O3 or higher.
> In early gcc you need to pass a flag.
> What do you mean "default vectorization level"?

If you pass in x86, do you get MMX, SSE, SSE2, etc? Or do you have to pass in
something like Athlon64, Pentium4, Pentium3, etc.?
You have to pass in something like pentium4/pentium3/etc

"generic" is an available -mtune flag, but it does not change the supported features (only march flags do).


My error on MMX not being used on the Win32 build. I was looking at autoconf'd Win64 source.
(In reply to comment #18)
> libpixman appears have hand-coded MMX and maybe SSE optimized code that's
> enabled for GCC on x86. I noticed that this is turned off for Windows. Should
> we try to turn this on?

MMX is already on under MSVC and gcc; with a cairo update I'm about to do (bug 446323), we'll get SSE2 code as well.  The SSE2 code will be initially disabled on both MSVC and gcc for us -- under MSVC until we get some patches to fix stupidity in the MSVC compiler when dealing with intrinsics, and in gcc until this bug is fixed.

Comment on attachment 295115 [details] [diff] [review]
Set 128-bit alignment for called function

bsmedberg, are you the right person to review this?
Attachment #295115 - Flags: review?(benjamin)
Attachment #295115 - Flags: review?(benjamin) → review+
vlad: I'm slightly confused.

https://bugs.freedesktop.org/show_bug.cgi?id=15693 libpixman (from git) segfaults with xulrunner is fixed in their version control system because it's a bug in gcc and should be fixed at the consumer point. Why shouldn't cairo also work around gcc's bugginess?

I still don't see this as a scalable fix, if we fix the alignment to 128bit today, tomorrow someone may need 256bits for the next buggy compiler super optimization.

I'm not really opposed to making this change, I just don't see it as a fix, just a band-aid. (And if we're going to take a band-aid, the patch in bug 218629 seems like it might be more forward looking.)

The /somewhat/ correct fix seems to be for libraries to apply force_align_arg_pointer on code that uses broken gcc functions.

... but really, gcc should be fixed so that reachable entrypoints do not assume something which will crash when a given optimization is used - I think http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 is the right bug....
(In reply to comment #25)
> vlad: I'm slightly confused.
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=15693 libpixman (from git)
> segfaults with xulrunner is fixed in their version control system because it's
> a bug in gcc and should be fixed at the consumer point. Why shouldn't cairo
> also work around gcc's bugginess?

The bug is the one and the same.

> I still don't see this as a scalable fix, if we fix the alignment to 128bit
> today, tomorrow someone may need 256bits for the next buggy compiler super
> optimization.

Then we'll change it to align to 256bits when needed; I don't see the problem.  It's not like this is changing from week to week.

> I'm not really opposed to making this change, I just don't see it as a fix,
> just a band-aid. (And if we're going to take a band-aid, the patch in bug
> 218629 seems like it might be more forward looking.)

Sure, the patch in 218629 does look better -- let's get that one reviewed an in instead.  I'm not really partial here, I'd just like to see it fixed.

> The /somewhat/ correct fix seems to be for libraries to apply
> force_align_arg_pointer on code that uses broken gcc functions.

Yes, though from my understanding this only works for gcc 4.2 or higher.  At least, the crash was occuring for me with gcc 4.1, and Soren indicated that gcc 4.2 was required there.  I guess I should take this or the other patch and verify that it fixes the crash; I'll do so tomorrow.

> ... but really, gcc should be fixed so that reachable entrypoints do not assume
> something which will crash when a given optimization is used - I think
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16660 is the right bug....

Sure, I agree, but we can't control or wait for gcc to be fixed :)
Duplicate of this bug: 453933
Vlad, is this patch ready to land? I remember reviewing multiple patches of this sort.
Assignee: swsnyder → vladimir
Flags: blocking1.9.1? → blocking1.9.1+
Yeah, I think it's fine -- the other patch does the same thing just with a lot more code.
This is checked in as c4c01d41e11a in m-c, isn't it?
So, is it now possible to turn on SSE2 in pixman without any risk?
I don't know, but thanks for reminding me to mark it fixed!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.