aarch64 baseline JIT fails to save x28 before clobbering it with sp

RESOLVED FIXED in Firefox 61

Status

()

Core
JavaScript Engine: JIT
P3
normal
RESOLVED FIXED
a year ago
16 days ago

People

(Reporter: Jeremy Linton, Assigned: lth)

Tracking

(Blocks: 1 bug)

53 Branch
mozilla61
ARM64
All
Points:
---

Firefox Tracking Flags

(firefox61 fixed)

Details

Attachments

(1 attachment)

(Reporter)

Description

a year ago
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0
Build ID: 20170518000419

Steps to reproduce:

Running fedora build which uses "-O2" and gcc 7.1.1 firefox crashes a few seconds after startup when the heartbeat runs. More information can be found here: 

https://bugzilla.redhat.com/show_bug.cgi?id=1458960





Actual results:

[Child 2343] WARNING: pipe error (25): Connection reset by peer: file /builddir/build/BUILD/firefox-53.0.3/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 346
Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=7.21357) Segmentation fault (core dumped)


Which is a bit misleading, the actual crashes move around depending on optimization flags/etc. 

Generally, it seems the aarch64 jit is using x28 as a stack pointer and "sp" more like a frame base. This wouldn't be a problem except for the poor code generation, and the initial load of x28 in UnboxedLayout::makeConstructorCode where there is a 

#ifdef JS_CODEGEN_ARM64
    // ARM64 communicates stack address via sp, but uses a pseudo-sp for addressing.
    masm.initStackPtr();
#endif

which works out to 

mov x28,sp 

at the head of every routine. X28 is a callee saved register on arm64, which means that the jit is clobbering data that is needed by the routines making calls into JIT'ed code. This means that crashes/misbehaviors happen to the caller at some later time. As x28 seems to be fairly low on the list of registers the compiler spills into, its use by the calling code is rarer than one would expect from random register corruption. 


Random example:

   0x00002ce4f50ca010:  mov     x28, sp
   0x00002ce4f50ca014:  sub     sp, x28, #0x8
   0x00002ce4f50ca018:  str     x30, [x28, #-8]!
   0x00002ce4f50ca01c:  sub     sp, x28, #0x8
   0x00002ce4f50ca020:  str     d31, [x28, #-8]!



Expected results:

x28 needs to either be pushed on the actual stack, and the code returning from JIT'ed routines needs to be tweaked to restore x28, or the JIT itself needs to just use sp as the stack pointer.

Its not clear to me why x28 is being used instead of sp (other than maybe an attempt to avoid sp alignment restrictions?), but the code being generated is terrible (as seen above) where sp is updated, and then x28 is used to save the register and updated. My gut instinct is simply to update the stack push/pop routines to actually use SP, even if it means wasting 8 bytes here and there (which is what is happening anyway) for single register updates. Beyond that, it seems to me that if x28 continues to be used, it wouldn't take much to fix the JIT entry code so that it conforms to aarch64 calling conventions so that GDB/etc can decode the stack from within a JIT'ed routine.
Component: Untriaged → JavaScript Engine: JIT
Product: Firefox → Core
(Reporter)

Comment 1

a year ago
As a hack, wrapping the CALL_GENERATED_* routines with assembly to save x28 (or clobber it so the compiler saves it) seems to be sufficient to stabilize the build with -O2, i'm checking -Os as soon as the couple js benchmarks finish.
(Reporter)

Comment 2

a year ago
Os seems to be working as well (if you ignore the fact that there are drawing errors, and the build is ~20% slower on my machine). O3 is still broken though as tabs keep crashing, but the errors are caught and the main browser frame remains.
Summary: aarch64 JIT fails to save x28 before clobbering it with sp → aarch64 baseline JIT fails to save x28 before clobbering it with sp
(Assignee)

Updated

9 months ago
Priority: -- → P3
(Assignee)

Updated

5 months ago
Hardware: Unspecified → ARM64
(Assignee)

Comment 3

4 months ago
Yeah, this looks wrong.  x28 is nonvolatile and must be saved.  Looking at this code briefly, it looks like the setup of x28 should happen after the saving of the nonvolatile registers and that we should not be using the PseudoStackPointer in the region where we're saving or restoring the registers.  (If the size of the registers to save is then not divisible by 16 we have other issues, but we should work around those.)  If this is a merge bug or just something we've not been able to test properly I don't know.

As the reporter remarks, x28 is used as a stack pointer and the SP is used sparingly because of alignment restrictions on SP on ARM64; the reengineering of the JIT to deal with those restrictions directly is not currently a priority.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Unspecified → All
(Assignee)

Updated

4 months ago
Blocks: 1334963
(Assignee)

Updated

4 months ago
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
(Assignee)

Comment 4

4 months ago
A test case such as this:

var s = "[";
for ( var i=0; i < 10000; i++ )
    s = s + `{"i":0},`;
s += `{"i":0}]`;
JSON.parse(s)

(although the limit can go as low as 1000 I think) causes the jit to call makeConstructorCode on my ARM64 box with only the baseline jit available.  This is the minimum we need to ensure that changes we make are not obviously wrong.

However that example by itself is not sufficient to provoke a crash, so it may be hard to verify that the bug is actually fixed.
(Assignee)

Comment 5

4 months ago
Created attachment 8954714 [details] [diff] [review]
bug1375074-save-restore-x28.patch

Save and restore x28 properly.  One subtlety is how we subsequently return, but I've commented on this in the code.
Attachment #8954714 - Flags: review?(sstangl)
Comment on attachment 8954714 [details] [diff] [review]
bug1375074-save-restore-x28.patch

Review of attachment 8954714 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/UnboxedObject.cpp
@@ +100,5 @@
> +    // calling code, and it is nonvolatile, so save it.  Do this as a special
> +    // case first because the generic save/restore code needs the PSP to be
> +    // initialized already.
> +    MOZ_ASSERT(PseudoStackPointer64.Is(masm.GetStackPointer64()));
> +    masm.Str(PseudoStackPointer64, vixl::MemOperand(sp, -16, vixl::PreIndex));

Is there some reason to avoid using Push()?

@@ +249,5 @@
> +
> +    // Restore the saved value of the PSP register, this value is whatever the
> +    // caller had saved in it, not any actual SP value, and it must not be
> +    // overwritten subsequently.
> +    masm.Ldr(PseudoStackPointer64, vixl::MemOperand(sp, 16, vixl::PostIndex));

Likewise, Pop()?
Attachment #8954714 - Flags: review?(sstangl) → review+
(Assignee)

Comment 7

3 months ago
(In reply to Sean Stangl [:sstangl] from comment #6)
> Comment on attachment 8954714 [details] [diff] [review]
> bug1375074-save-restore-x28.patch
> 
> Review of attachment 8954714 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/vm/UnboxedObject.cpp
> @@ +100,5 @@
> > +    // calling code, and it is nonvolatile, so save it.  Do this as a special
> > +    // case first because the generic save/restore code needs the PSP to be
> > +    // initialized already.
> > +    MOZ_ASSERT(PseudoStackPointer64.Is(masm.GetStackPointer64()));
> > +    masm.Str(PseudoStackPointer64, vixl::MemOperand(sp, -16, vixl::PreIndex));
> 
> Is there some reason to avoid using Push()?  [and Pop()]

Yes.  The configured Stack Pointer is x28 so Push would use that.  But (a) x28 is what we're trying to save here so that it won't be clobbered and (b) it's not initialized yet, it points into random memory.

We could instead set the Stack Pointer to be sp for the purpose of this operation and then call Push(), but that won't work; Push() defers to push() which says:

  // If the current stack pointer (as set by SetStackPointer) is sp, then it
  // must be aligned to 16 bytes on entry and the total size of the specified
  // registers must also be a multiple of 16 bytes.

and though the sp is aligned we're only pushing one register.  Note the code above subtracts 16 but only stores 8.

Comment 8

3 months ago
Pushed by lhansen@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/800abe66894d
Save and restore non-volatile x28 on ARM64 for generated unboxed object constructor.  r=sstangl

Comment 9

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/800abe66894d
Status: ASSIGNED → RESOLVED
Last Resolved: 3 months ago
status-firefox61: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
(Assignee)

Updated

3 months ago
See Also: → bug 1445907
You need to log in before you can comment on or make changes to this bug.