Closed Bug 860867 Opened 7 years ago Closed 7 years ago

Baseline Compiler startup crash in local opt build

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED WONTFIX
mozilla23

People

(Reporter: ehsan, Assigned: ehsan)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I just did a local build and baseline crashes during the startup for me every single time.  I demoed this to Kannan in person.

See https://gist.github.com/ehsan/5366294 for the full backtrace.
And disabling baseline using the pref fixes the crash.
I updated my tree to the latest m-c and the problem still exists.  This is sort of a pain because I can't run tests on my opt build :(
Still hitting this problem, can someone please try to reproduce?
I can try to reproduce this later today or tomorrow.
Atte Kettunen is seeing this on ASan builds at startup too. I was not yet able to reproduce and for him it started happening with a newer clang version (might be that something is optimized differently).
I would be more than happy to build an optimized build with debug symbols and upload it somewhere if needed.
Uploaded my build here: <http://people.mozilla.org/~eakhgari/firefox-860867.dmg>
So here's the generated code:

0x102e5a920 <_ZN2js3ion16BaselineCompiler8emitCallEv+592>:	lea    (%r9,%rsi,1),%rcx
0x102e5a924 <_ZN2js3ion16BaselineCompiler8emitCallEv+596>:	movd   %rcx,%xmm7
0x102e5a929 <_ZN2js3ion16BaselineCompiler8emitCallEv+601>:	movlhps %xmm7,%xmm7
0x102e5a92c <_ZN2js3ion16BaselineCompiler8emitCallEv+604>:	movaps %xmm7,%xmm2
0x102e5a92f <_ZN2js3ion16BaselineCompiler8emitCallEv+607>:	paddq  %xmm8,%xmm2
0x102e5a934 <_ZN2js3ion16BaselineCompiler8emitCallEv+612>:	movq   %xmm2,%rdi
0x102e5a939 <_ZN2js3ion16BaselineCompiler8emitCallEv+617>:	punpckhqdq %xmm2,%xmm2
0x102e5a93d <_ZN2js3ion16BaselineCompiler8emitCallEv+621>:	paddq  %xmm9,%xmm7
0x102e5a942 <_ZN2js3ion16BaselineCompiler8emitCallEv+626>:	movq   %xmm7,%rbx
                                                            bug----------^
0x102e5a947 <_ZN2js3ion16BaselineCompiler8emitCallEv+631>:	punpckhqdq %xmm7,%xmm7
0x102e5a94b <_ZN2js3ion16BaselineCompiler8emitCallEv+635>:	lea    (%rdi,%rdi,2),%rdi
0x102e5a94f <_ZN2js3ion16BaselineCompiler8emitCallEv+639>:	movq   %xmm2,%rcx
0x102e5a954 <_ZN2js3ion16BaselineCompiler8emitCallEv+644>:	add    $0xfffffffffffffffc,%rsi
0x102e5a958 <_ZN2js3ion16BaselineCompiler8emitCallEv+648>:	lea    (%rcx,%rcx,2),%rcx
0x102e5a95c <_ZN2js3ion16BaselineCompiler8emitCallEv+652>:	movd   (%rax,%rcx,8),%xmm3
0x102e5a961 <_ZN2js3ion16BaselineCompiler8emitCallEv+657>:	movd   (%rax,%rdi,8),%xmm4
0x102e5a966 <_ZN2js3ion16BaselineCompiler8emitCallEv+662>:	movq   %xmm7,%rcx
0x102e5a96b <_ZN2js3ion16BaselineCompiler8emitCallEv+667>:	lea    (%rcx,%rcx,2),%rcx
0x102e5a96f <_ZN2js3ion16BaselineCompiler8emitCallEv+671>:	lea    (%rbx,%rbx,2),%rdi
0x102e5a973 <_ZN2js3ion16BaselineCompiler8emitCallEv+675>:	movd   (%rax,%rdi,8),%xmm7
0x102e5a978 <_ZN2js3ion16BaselineCompiler8emitCallEv+680>:	movd   (%rax,%rcx,8),%xmm2
0x102e5a97d <_ZN2js3ion16BaselineCompiler8emitCallEv+685>:	mov    %rdx,%rcx
0x102e5a980 <_ZN2js3ion16BaselineCompiler8emitCallEv+688>:	add    %rsi,%rcx
0x102e5a983 <_ZN2js3ion16BaselineCompiler8emitCallEv+691>:	punpckldq %xmm3,%xmm4
0x102e5a987 <_ZN2js3ion16BaselineCompiler8emitCallEv+695>:	pshufd $0x10,%xmm4,%xmm4
0x102e5a98c <_ZN2js3ion16BaselineCompiler8emitCallEv+700>:	pand   %xmm10,%xmm4
0x102e5a991 <_ZN2js3ion16BaselineCompiler8emitCallEv+705>:	pcmpeqd %xmm6,%xmm4
0x102e5a995 <_ZN2js3ion16BaselineCompiler8emitCallEv+709>:	pshufd $0xb1,%xmm4,%xmm3
0x102e5a99a <_ZN2js3ion16BaselineCompiler8emitCallEv+714>:	pand   %xmm4,%xmm3
0x102e5a99e <_ZN2js3ion16BaselineCompiler8emitCallEv+718>:	pand   %xmm5,%xmm3
0x102e5a9a2 <_ZN2js3ion16BaselineCompiler8emitCallEv+722>:	punpckldq %xmm2,%xmm7
0x102e5a9a6 <_ZN2js3ion16BaselineCompiler8emitCallEv+726>:	paddq  %xmm3,%xmm1
0x102e5a9aa <_ZN2js3ion16BaselineCompiler8emitCallEv+730>:	pshufd $0x10,%xmm7,%xmm2
0x102e5a9af <_ZN2js3ion16BaselineCompiler8emitCallEv+735>:	pand   %xmm10,%xmm2
0x102e5a9b4 <_ZN2js3ion16BaselineCompiler8emitCallEv+740>:	pcmpeqd %xmm6,%xmm2
0x102e5a9b8 <_ZN2js3ion16BaselineCompiler8emitCallEv+744>:	pshufd $0xb1,%xmm2,%xmm3
0x102e5a9bd <_ZN2js3ion16BaselineCompiler8emitCallEv+749>:	pand   %xmm2,%xmm3
0x102e5a9c1 <_ZN2js3ion16BaselineCompiler8emitCallEv+753>:	pand   %xmm5,%xmm3
0x102e5a9c5 <_ZN2js3ion16BaselineCompiler8emitCallEv+757>:	paddq  %xmm3,%xmm0
0x102e5a9c9 <_ZN2js3ion16BaselineCompiler8emitCallEv+761>:	jne    0x102e5a920 <_ZN2js3ion16BaselineCompiler8emitCallEv+592>

Here the compiler is moving the xmm7 register into a 64-bit register, which means that the carry over bit will not be clobbered, and later on when this variable is used as an offset it ends up being a really large value.
To expand a bit on ehsan's comment: the issue is that auto-vectorization of loops by Clang is misbehaving on a particular loop.

The auto-vectorization lifts some 32-bit operations into a 2*64-bit XMM register, performs operations which make the values in those register exceed the 32-bit range, and then move them back into a 64-bit register without truncating to a 32-bit range.

Solution is to turn off auto-vectorization on BaselineCompiler.cpp
(In reply to Kannan Vijayan [:djvj] from comment #10)
> To expand a bit on ehsan's comment: the issue is that auto-vectorization of
> loops by Clang is misbehaving on a particular loop.

Is this a Clang bug?
(In reply to Christian Holler (:decoder) from comment #11)
> Is this a Clang bug?

Yes.
First things first, someone needs to look at the generated code for this function in one of our official Mac nightly binaries.  If the bug is present there, then this needs to be treated with a different priority.  FWIW I tried clang trunk and it generated the same bad code, so I think that we'll find the same bad code in our official binaries.
With respindola gone, CC'ing nfroyd on this Clang/LLVM-related bug.
I don't see the buggy code in a current Mac Nightly.

The loop vectorizer is enabled by default at -O3 in pre-release versions of clang/llvm 3.3 and it was enabled only by explicit request in earlier releases.  Our current clang for nightlies is from the 3.2 release branch, so that'd explain why we're not seeing it in nightlies.
(In reply to comment #15)
> I don't see the buggy code in a current Mac Nightly.
> 
> The loop vectorizer is enabled by default at -O3 in pre-release versions of
> clang/llvm 3.3 and it was enabled only by explicit request in earlier releases.
>  Our current clang for nightlies is from the 3.2 release branch, so that'd
> explain why we're not seeing it in nightlies.

That doesn't explain why I'm seeing it in my local builds though, does it?  I'm building with --enable-optimize, which means -O2 I think.

This will be a time bomb when we want to upgrade to 3.3.  Is there a command line flag to turn it off everywhere?  Perhaps we should add that to the build system...
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #16)
> 
> That doesn't explain why I'm seeing it in my local builds though, does it? 
> I'm building with --enable-optimize, which means -O2 I think.

I was told that the JS engine is built with -O3 by default.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #16)
> (In reply to comment #15)
> > I don't see the buggy code in a current Mac Nightly.
> > 
> > The loop vectorizer is enabled by default at -O3 in pre-release versions of
> > clang/llvm 3.3 and it was enabled only by explicit request in earlier releases.
> >  Our current clang for nightlies is from the 3.2 release branch, so that'd
> > explain why we're not seeing it in nightlies.
> 
> That doesn't explain why I'm seeing it in my local builds though, does it? 
> I'm building with --enable-optimize, which means -O2 I think.

js/src builds with -O3:

[froydnj@cerebro src]$ pwd
/opt/build/froydnj/build-inbound/js/src
[froydnj@cerebro src]$ make echo-variable-MOZ_OPTIMIZE_FLAGS
-O3 -freorder-blocks
[froydnj@cerebro src]$ make -C ../../ echo-variable-MOZ_OPTIMIZE_FLAGS
make: Entering directory `/opt/build/froydnj/build-inbound'
-Os -freorder-blocks
make: Leaving directory `/opt/build/froydnj/build-inbound'
[froydnj@cerebro src]$ 

This is on Linux, but assuming I'm understanding configure.in right, there's no reason it should be any different on Mac.

I am assuming that you use a newer version of clang for your local builds.  Is that correct?

> This will be a time bomb when we want to upgrade to 3.3.  Is there a command
> line flag to turn it off everywhere?  Perhaps we should add that to the
> build system...

-fno-vectorize turns it off on a trunk build of clang.  The release notes for 3.2 say that -vectorize is the option to turn it on, so it's also possible that -no-vectorize works...or the option has been changed during the 3.3 cycle.
(In reply to comment #18)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #16)
> > (In reply to comment #15)
> > > I don't see the buggy code in a current Mac Nightly.
> > > 
> > > The loop vectorizer is enabled by default at -O3 in pre-release versions of
> > > clang/llvm 3.3 and it was enabled only by explicit request in earlier releases.
> > >  Our current clang for nightlies is from the 3.2 release branch, so that'd
> > > explain why we're not seeing it in nightlies.
> > 
> > That doesn't explain why I'm seeing it in my local builds though, does it? 
> > I'm building with --enable-optimize, which means -O2 I think.
> 
> js/src builds with -O3:
> 
> [froydnj@cerebro src]$ pwd
> /opt/build/froydnj/build-inbound/js/src
> [froydnj@cerebro src]$ make echo-variable-MOZ_OPTIMIZE_FLAGS
> -O3 -freorder-blocks
> [froydnj@cerebro src]$ make -C ../../ echo-variable-MOZ_OPTIMIZE_FLAGS
> make: Entering directory `/opt/build/froydnj/build-inbound'
> -Os -freorder-blocks
> make: Leaving directory `/opt/build/froydnj/build-inbound'
> [froydnj@cerebro src]$ 
> 
> This is on Linux, but assuming I'm understanding configure.in right, there's no
> reason it should be any different on Mac.
> 
> I am assuming that you use a newer version of clang for your local builds.  Is
> that correct?

Yes, I use tip of tree.

> > This will be a time bomb when we want to upgrade to 3.3.  Is there a command
> > line flag to turn it off everywhere?  Perhaps we should add that to the
> > build system...
> 
> -fno-vectorize turns it off on a trunk build of clang.  The release notes for
> 3.2 say that -vectorize is the option to turn it on, so it's also possible that
> -no-vectorize works...or the option has been changed during the 3.3 cycle.

glandium, where's the right place to add that to the build options?
Er, glandium, please see comment 19.
Flags: needinfo?(mh+mozilla)
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #19)
> glandium, where's the right place to add that to the build options?

I'd say in MOZ_OPTIMIZE_FLAGS definitions in js/src/configure.in, where it contains -O3, conditional to the compiler being clang (test -n "$CLANG_CXX")
Flags: needinfo?(mh+mozilla)
Attached patch Patch (v1)Splinter Review
Assignee: general → ehsan
Status: NEW → ASSIGNED
Attachment #742080 - Flags: review?(mh+mozilla)
Attachment #742080 - Flags: review?(mh+mozilla) → review+
My attempts at converting this code into a smaller standalone test case which we can use to file an LLVM bug based on have failed.  It would be fantastic if somebody who knows this code takes on doing that.
https://hg.mozilla.org/mozilla-central/rev/694322457a64
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
clang 3.3 has a fix for this bug now, so I'm backing out the work-around patch:

https://hg.mozilla.org/integration/mozilla-inbound/rev/02f1eeef2613
Resolution: FIXED → WONTFIX
Duplicate of this bug: 876130
You need to log in before you can comment on or make changes to this bug.