Closed Bug 680547 Opened 13 years ago Closed 10 years ago

segfault when compiled with avx support on i7

Categories

(Core :: XPConnect, defect)

6 Branch
x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox38 --- fixed

People

(Reporter: u43589, Assigned: dbaron)

References

Details

(Keywords: crash)

Attachments

(5 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20100101 Firefox/6.0 Build ID: 20110816074220 Steps to reproduce: built firefox 6.0 with -march=native (implies -mavx). GCC 4.6.0 Actual results: segfaults Expected results: it should run
Attached file bt
Need a stacktrace from a debug build.
Attached file debug
Component: General → DOM: Core & HTML
Product: Firefox → Core
QA Contact: general → general
Status: UNCONFIRMED → NEW
Ever confirmed: true
Line 104 there is: *aDecision = nsIContentPolicy::ACCEPT; where aDecision is a PRInt16* The call into content policy in this case is coming from JS somewhere, so it's possible that the issue is that -mavx changes stack alignment or something such that xptcall no longer works.
Component: DOM: Core & HTML → XPConnect
QA Contact: general → xpconnect
Something similar happens on systems with Sandy Bridge CPU, gcc 4.5.3 / amd64 and Thunderbird 3.1.12 or Firefox 3.6.20 / xulrunner 1.9.2.20: If I compile Firefox or Thunderbird with gcc option "-march=native", no window opens after I start Firefox or Thunderbird. Firefox returns with ERC 1 after a few seconds. However, if I compile the programs with gcc options "-march=native -mno-avx", Firefox and Thunderbird work as expected. More information can be found here: http://forums.gentoo.org/viewtopic-t-893300.html
Do you still see this crash when using newer version?
Severity: normal → critical
Flags: needinfo?(bugzilla)
Keywords: crash
yes
Flags: needinfo?(bugzilla)
Keywords: stackwanted
I have the same problem with Fx 28 and gcc-4.7.3 and march=corei7-avx. Firefox segfaults with CFLAGS="-march=corei7-avx -fno-ident -pipe -findirect-inlining" but works with CFLAGS="-march=corei7-avx -mno-avx -fno-ident -pipe -findirect-inlining" The segfault occurs in https://mxr.mozilla.org/mozilla-aurora/source/content/base/src/nsFrameMessageManager.cpp#540 Program received signal SIGSEGV, Segmentation fault. 0x00007ffff2d3b5b1 in nsFrameMessageManager::SendMessage (this=0x7fffd2f27430, aMessageName=..., aJSON=..., aObjects=..., aPrincipal=0x0, aCx=0x7ffff6c179d0, aArgc=120 'x', aRetval= 0x7ffff2aa27c1 <XPCConvert::JSData2Native(void*, JS::Handle<JS::Value>, nsXPTType const&, bool, nsID const*, tag_nsresult*)+65>, aIsSync=true) at /home/sephiroth/building_mozilla/moz-aurora/content/base/src/nsFrameMessageManager.cpp:540 540 *aRetval = JSVAL_VOID;
build log (see previous comment)
Nobody interested in fixing this? It was reported 3 years ago and this bug still exist in Firefox 28 :(
As explained in bug 1111355, having avx enabled appears to change the alignment behavior of alloca (apparently adding an extra 16 bytes) of padding/alignment (and using 32-byte alignment instead of 16-byte). The suggestion of using __bultin_alloca_with_align in bug 1113555 didn't fix the problem, so this seems to be the best available workaround, given that this code, which should perhaps better be written in assembly, is written in C++. Interestingly, this is NOT fixed by #pragma GCC target ("arch=x86-64"). (I determined the (undocumented) name for the default -march value on x86_64 from the gcc source code (gcc/config/i386/i386.c, function ix86_option_override_internal, code that sets opts->x_ix86_arch_string .) I confirmed that this sets the same macros based on the empty diff between the output of 'gcc -E -dM -x c++ /dev/null' and 'gcc -E -dM -x c++ -march=x86-64 /dev/null', which was not an empty diff for other -march values (e.g., k8).) I confirmed that the push_options and pop_options actually work by putting the push/pop pair around a different (earlier) function, and testing that this did not fix the bug (with the pop_options before NS_InvokeByIndex). See the gcc documentation at: https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Function-Specific-Option-Pragmas.html https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Function-Attributes.html https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/i386-and-x86-64-Options.html
Attachment #8561696 - Flags: review?(nfroyd)
Assignee: nobody → dbaron
Status: NEW → ASSIGNED
As explained in bug 1111355, having avx enabled appears to change the alignment behavior of alloca (apparently adding an extra 16 bytes) of padding/alignment (and using 32-byte alignment instead of 16-byte). The suggestion of using __bultin_alloca_with_align in bug 1111355 didn't fix the problem, so this seems to be the best available workaround, given that this code, which should perhaps better be written in assembly, is written in C++. Interestingly, this is NOT fixed by #pragma GCC target ("arch=x86-64"). (I determined the (undocumented) name for the default -march value on x86_64 from the gcc source code (gcc/config/i386/i386.c, function ix86_option_override_internal, code that sets opts->x_ix86_arch_string .) I confirmed that this sets the same macros based on the empty diff between the output of 'gcc -E -dM -x c++ /dev/null' and 'gcc -E -dM -x c++ -march=x86-64 /dev/null', which was not an empty diff for other -march values (e.g., k8).) I confirmed that the push_options and pop_options actually work by putting the push/pop pair around a different (earlier) function, and testing that this did not fix the bug (with the pop_options before NS_InvokeByIndex). See the gcc documentation at: https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Function-Specific-Option-Pragmas.html https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Function-Attributes.html https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/i386-and-x86-64-Options.html
Attachment #8561737 - Flags: review?(nfroyd)
Attachment #8561696 - Attachment is obsolete: true
Attachment #8561696 - Flags: review?(nfroyd)
Comment on attachment 8561737 [details] [diff] [review] Compile Linux 64-bit NS_InvokeByIndex with -mno-avx to allow compiling with -march=native on new hardware, or similar -march flags Review of attachment 8561737 [details] [diff] [review]: ----------------------------------------------------------------- That's mildly depressing that __builtin_alloca_with_align didn't work. It looks like GCC 4.6, our minimum compiler, is the first to ship with AVX, so I don't think we need |#if defined(__AVX__)| around those blocks.
Attachment #8561737 - Flags: review?(nfroyd) → review+
(In reply to David Baron [:dbaron] (UTC+11) (needinfo? for questions) from comment #17) > https://hg.mozilla.org/integration/mozilla-inbound/rev/3023f9390942 Great! This patch works for me. Thanks!
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Thanks a bunch! Better late than never :)
Hi! I'm afraid the fix is incomplete. The patch is #ifndef __clang__ but the same problem arises if one uses clang and compiles with -march=core-avx-i. I don't know what the clang equivalent solution would be.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: