Last Comment Bug 545747 - nanojit breaks ARM EABI stack alignment requirement
: nanojit breaks ARM EABI stack alignment requirement
Status: RESOLVED FIXED
:
Product: Core Graveyard
Classification: Graveyard
Component: Nanojit (show other bugs)
: 1.9.1 Branch
: ARM Linux
: -- critical (vote)
: mozilla1.9.1
Assigned To: Mike Hommey [:glandium]
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-02-11 14:12 PST by Mike Hommey [:glandium]
Modified: 2014-03-17 08:00 PDT (History)
7 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---
.10-fixed


Attachments
Patch (1.21 KB, patch)
2010-02-11 14:12 PST, Mike Hommey [:glandium]
no flags Details | Diff | Splinter Review
patch for 1.9.1 (418 bytes, patch)
2010-02-14 02:05 PST, Mike Hommey [:glandium]
edwsmith: review-
Details | Diff | Splinter Review
Patch v2 (different approach) (3.68 KB, patch)
2010-03-16 01:10 PDT, Mike Hommey [:glandium]
edwsmith: review+
Jacob.Bramley: review+
mbeltzner: approval1.9.1.10+
Details | Diff | Splinter Review

Description Mike Hommey [:glandium] 2010-02-11 14:12:13 PST
Created attachment 426574 [details] [diff] [review]
Patch

Here is how NS_InvokeByIndex currently works on ARM EABI:
- invoke_count_words estimates the number of words that may be necessary for the given parameters (worst case is used for 64 bits parameters), and adds 1 to the count when it is even, to ensure an odd result.
- the stack is grown with that word number * 4
- the parameters are copied on the stack with invoke_copy_to_stack
- 3 words are taken from stack to fill r1-r3, r0 getting the value from the that variable
- stack is shrunk by 12 bytes to start at the 4th word of the parameters list.
- the method is called with the parameters.

When filling the stack in invoke_copy_to_stack, 64 bit parameters are specially handled as required by the EABI so that they are double word aligned.

For functions such as nsNavBookmarks::GetFolderIdForItem(long long aItemId, long long *aFolderId), the result is supposed to be:
r0 - this
r1 - ignored
r2 - aItemId, low bits
r3 - aItemId, high bits
stack - aFolderId

For such functions, the value that invoke_count_words is 5 (3 for aItemId + 1 for aFolderId + 1 to make the result odd), which leads to 20 bytes of stack growth.

In cases where the stack is double word aligned when NS_InvokeByIndex is called, growing the stack by 20 bytes makes it oddly aligned, and invoke_copy_to_stack then doesn't position 64 bits properly and the result when calling nsNavBookmarks::GetFolderIdForItem is this:
r0 - this
r1 - aItemId, low bits
r2 - aItemId, high bits
r3 - aFolderId

which obviously, nsNavBookmarks::GetFolderIdForItem won't be using properly.
This can obviously lead to very bad things depending on what the other arguments to the function are, after the first double word argument.

So, instead of having invoke_count_words avoiding its return value from being even, NS_InvokeByIndex should modify invoke_count_words' return value depending on the current stack pointer.

I guess the problems on maemo after bug #530087 are due to the very same issue, though I'd say there are much better ways to improve the code, and reduce the amount of assembly at the same time (which applies to most xptcinvoke implementations, IMHO)

Anyways, the patch I'm attaching here solves the problem.
Comment 1 Mike Hommey [:glandium] 2010-02-13 05:10:42 PST
The bug description is mostly good, except that the cause is misaligned stack, not aligned stack, and it breaks a lot of 64-bits arguments passing. And the reason why NS_InvokeByIndex is called with a misaligned stack is not because it is possible, which I wrongly assumed, but because nanojit breaks it. Disabling JIT in xpcshell does solve the issue. Sadly, this needs to be done by hand in runxpcshelltests.py, which is why my first attempts at disabling nanojit failed, and led me on the wrong track with xptcall.
Comment 2 Mike Hommey [:glandium] 2010-02-13 10:18:29 PST
FWIW, for me the issue is reliably reproducible with the test_places/autocomplete/test_418257.js xpcshell test. To get interesting information, break in nsNavBookmarks::GetFolderIdForItem and continue execution until aItemId gets an impossible value, like -4713317881092243456, then examine the stack pointer: its value is not 64-bits aligned. r0-r3 contain data as explained in comment 0, and up the call stack you can go to the generated assembly from the JIT.
FWIW, here is what i think is the significant portion of the disassembled content of the JIT generated function that got the stack misaligned for me:
(...)
0x400356a4:     adds    r3, r0, #12
0x400356a8:     sub     r3, r11, #28
0x400356ac:     str     r3, [r11, #-12]
0x400356b0:     adds    r3, r0, #8
0x400356b4:     sub     r3, r11, #32
0x400356b8:     str     r2, [r4, #72]   ; 0x48
0x400356bc:     str     r0, [r4, #76]   ; 0x4c
0x400356c0:     ldr     r0, [pc, #-1468]        ; 0x4003510c
0x400356c4:     str     r0, [r10, #504] ; 0x1f8
0x400356c8:     ldr     r12, [r11, #-12]
0x400356cc:     push    {r12}           ; (str r12, [sp, #-4]!)
0x400356d0:     movs    r2, #1
0x400356d4:     mov     r0, r10
0x400356d8:     bl      0x4027e420 <_Z17XPC_WN_CallMethodP9JSContextP8JSObjectjPiS3_>

The push {r12} is most probably the culprit.

Note the first instruction is apparently useless, as the value in r3 is overwritten on the next instruction, and the flags are not used afterwards. Also note this is with nanojit from 1.9.1, but I should be able to give 1.9.2 a try next week or the following one.
Comment 3 Mike Hommey [:glandium] 2010-02-13 10:28:50 PST
If I read the generated code before that correctly, r11 is only set at what looks like the beginning of the function, likewise for sp:
0x40035468:     push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}
0x4003546c:     mov     r11, sp
0x40035470:     subs    sp, sp, #52     ; 0x34

After that sp should be correctly aligned, but after the last push {r12}, it is misaligned.
Comment 4 Mike Hommey [:glandium] 2010-02-13 10:35:21 PST
What is even more troubling is that this value that is pushed on stack doesn't even seem to be used afterwards. Even better, [r11, #-12] is reloaded later in another register:
0x400356f0:     ldr     r0, [r11, #-12]
Comment 5 Mike Hommey [:glandium] 2010-02-14 00:17:31 PST
(In reply to comment #4)
> What is even more troubling is that this value that is pushed on stack doesn't
> even seem to be used afterwards. Even better, [r11, #-12] is reloaded later in
> another register:
> 0x400356f0:     ldr     r0, [r11, #-12]

Forget this, I was hit by the obvious: what is pushed on stack before the function call is the last argument to XPC_WN_CallMethod, which, as it takes 5, uses the 4 r0-r3 registers and needs a last argument on stack.
Comment 6 Mike Hommey [:glandium] 2010-02-14 00:29:52 PST
Looking at the asm_call and asm_arg code in mozilla1.9.1, mozilla1.9.2 and mozilla-central, I'd say all are affected.
Comment 7 Mike Hommey [:glandium] 2010-02-14 02:05:03 PST
Created attachment 426881 [details] [diff] [review]
patch for 1.9.1

Assembler::genPrologue is already doing the right thing with NJ_ALIGN_STACK, which is properly set to 8, so I guess we should be safe with stack alignment after this fix.

In 1.9.2 and mozilla-central, there is max_out_args update where this patch would normally apply, so the patch would apply before that, but I really wonder what max_out_args is supposed to achieve: it is used to calculate the needed stack in genPrologue, but still, the stack is grown for each argument that doesn't fit in registers. It seems to me it just leaves unused space in the stack...
Comment 8 Mark Finkle (:mfinkle) (use needinfo?) 2010-02-14 16:22:56 PST
I could be way off base, but the stack is interesting as we have had spurious reports of GetFolderIdforItem failing on mobile ARM devices: bug 519514 and bug 515757
Comment 9 Mike Hommey [:glandium] 2010-02-14 23:27:02 PST
Yes, both bugs look like the errors I got in the testsuite because of this bug in the first place. Bug 482964 is also the same IMHO.
Comment 10 Mike Hommey [:glandium] 2010-03-15 02:00:50 PDT
Comment on attachment 426881 [details] [diff] [review]
patch for 1.9.1

(picking a random reviewer from the history of NativeARM.cpp)

This patch is for 1.9.1 but the fix is similar for 1.9.2 and trunk, but please see comment 7.
Comment 11 Edwin Smith 2010-03-15 07:42:02 PDT
Comment on attachment 426881 [details] [diff] [review]
patch for 1.9.1

This patch optionally subracts 4 from SP at the point of a call, but does not compensate with an add.  If the call occurs in a loop, the stack will grow by 4 bytes each time through the loop, and every other call would be mis-aligned.

The ARM backend manages the stack by calculating the max space needed for outgoing args in "max_out_args", and does a single subtraction at procedure entry. (see the code in Assembler::genPrologue() in NativeARM.cpp).  We should ensure this value is properly aligned.  Then then SP will be aligned throughout the procedure as long as SP is aligned when called.

Therefore, at the bottom of asm_call, just before we adjust max_out_args, I think the fix should be:

    if ((stkd & 4) != 0)
        stkd += 4;
Comment 12 Mike Hommey [:glandium] 2010-03-15 08:08:07 PDT
(In reply to comment #11)
> The ARM backend manages the stack by calculating the max space needed for
> outgoing args in "max_out_args",

It doesn't on 1.9.1.

> and does a single subtraction at procedure
> entry. (see the code in Assembler::genPrologue() in NativeARM.cpp).  We should
> ensure this value is properly aligned.  Then then SP will be aligned throughout
> the procedure as long as SP is aligned when called.
> 
> Therefore, at the bottom of asm_call, just before we adjust max_out_args, I
> think the fix should be:
> 
>     if ((stkd & 4) != 0)
>         stkd += 4;

therefore, this can't work on 1.9.1
Comment 13 Edwin Smith 2010-03-15 08:46:29 PDT
Good point, i was looking at nanojit-central.

Still, looking at mozilla-1.9.1, there's no existing code to restore SP after the call (before the point we emit BL, in asm_call()) if stack arguments were pushed.  With the fix, SP would stay 8-aligned, but wouldn't we still have the unlimited growth if the call occurs in a loop?
Comment 14 Mike Hommey [:glandium] 2010-03-15 09:51:59 PDT
I hadn't taken a look at the generated assembly, but seeing it, I'm actually wondering how my patch manages to work...

I'll try to find some better way to fix this for 1.9.1, which might be to "just" backport the max_out_args stuff...
Comment 15 Mike Hommey [:glandium] 2010-03-15 09:53:18 PDT
I'd need to write a trace test that would trigger these problems, would you have hints as how to possibly trigger this with a simple js ?
Comment 16 Edwin Smith 2010-03-15 10:24:02 PDT
If you have a snip of js that causes the unaligned call, then writing a simple loop around it (in js) should hopefully generate a trace that loops back on itself.
Comment 17 Edwin Smith 2010-03-15 10:28:11 PDT
(In reply to comment #13)

> Still, looking at mozilla-1.9.1, there's no existing code to restore SP after
> the call (before the point we emit BL, in asm_call()) if stack arguments were
> pushed.  With the fix, SP would stay 8-aligned, but wouldn't we still have the
> unlimited growth if the call occurs in a loop?

For that matter, wouldn't the growth *already* be unlimited, if you call any function that takes stack args from inside a loop?  If we have that problem, then oh-my-gosh.  but if not, what prevents it?  whatever prevents it would also prevent unlimited growth with your fix.
Comment 18 Mike Hommey [:glandium] 2010-03-15 12:02:49 PDT
(In reply to comment #16)
> If you have a snip of js that causes the unaligned call, then writing a simple
> loop around it (in js) should hopefully generate a trace that loops back on
> itself.

I unfortunately only have a testcase using xptcinvoke, which is why I asked if you had an idea.
Comment 19 Mike Hommey [:glandium] 2010-03-15 12:07:40 PDT
(In reply to comment #17)
> For that matter, wouldn't the growth *already* be unlimited, if you call any
> function that takes stack args from inside a loop?  If we have that problem,
> then oh-my-gosh.  but if not, what prevents it?  whatever prevents it would
> also prevent unlimited growth with your fix.

The arguments are never put on stack using SP. They are put with r11 which is a copy of SP when entering the generated code.
Comment 20 Mike Hommey [:glandium] 2010-03-15 17:03:56 PDT
I could confirm the behaviour with a very simple testcase, and even without the patch, calling (non fast) native functions grows the stack at each iteration. The patch makes it grow faster. I'm also afraid of what this means for data access on the stack.

The simple testcase is:
for (i = 0; i < 10; i++) {
  version();
}
run through the js shell.
Comment 21 Mike Hommey [:glandium] 2010-03-15 17:05:02 PDT
(In reply to comment #19)
> The arguments are never put on stack using SP. They are put with r11 which is a
> copy of SP when entering the generated code.

The arguments *are* put on stack using SP, but other data access on stack in the generated code is done through r11.
Comment 22 Mike Hommey [:glandium] 2010-03-15 17:50:32 PDT
(In reply to comment #14)
> I hadn't taken a look at the generated assembly, but seeing it, I'm actually
> wondering how my patch manages to work...

Actually, it *does* the right thing, for 1.9.1, because it doesn't fill arguments on stack the same way 1.9.2 and trunk do. On trunk, afaics (haven't verified the generated code), the stack is pre-aligned for the arguments, and the arguments are being saved at the appropriate position on the stack. On 1.9.1, the arguments are pushed on the stack from the last one to the first one (in the generate code). So, for the overall to be correctly aligned, the stack needs to be adjusted for the pushes to end up at the correct position. Which is why the patch works as it is. The only problem with it now, is that nothing restores the stack pointer once the function call has returned.
What could probably be done is to force to restore the stack pointer after the function call, possibly from what is saved in r11.

As a side note, it looks like the JIT is never going to call functions with more than 5 arguments in tracemonkey, and never with 64 bits arguments, which makes the compiler overly complex for tm, and as such, makes it slower. For instance, a function call could just remove 4 from sp, push the 5th arg, call the function and add 8 to sp on return in case there is a 5th arg, and be done with it ; setting arguments on stack could be a matter of setting the good registers and would remove a lot of logic. It is nice to have a general purpose jit compiler, but is it not overkill for tracemonkey ? (but then, I don't really know how much the compiler itself weighs on js performance)
Comment 23 Mike Hommey [:glandium] 2010-03-15 18:07:45 PDT
(In reply to comment #6)
> Looking at the asm_call and asm_arg code in mozilla1.9.1, mozilla1.9.2 and
> mozilla-central, I'd say all are affected.

Re-reading the code, I'd say this actually doesn't affect 1.9.2 and trunk. I'll check more thoroughly tomorrow.
Comment 24 Mike Hommey [:glandium] 2010-03-16 00:48:25 PDT
Confirming, this only affects 1.9.1.
Comment 25 Mike Hommey [:glandium] 2010-03-16 01:10:26 PDT
Created attachment 432754 [details] [diff] [review]
Patch v2 (different approach)

This time, using a max_out_args approach, like what is done in 1.9.2 and trunk. This fixes both issues the original code had, that is, the ever growing stack and the unaligned stack. I haven't run the full test suite with this patch, yet.
Comment 26 Edwin Smith 2010-03-16 06:42:54 PDT
Comment on attachment 432754 [details] [diff] [review]
Patch v2 (different approach)

Looks right to me, logically and compared to the nanojit-central tip.  However, asking Jacob for a second review given the criticality of the fix, plus its easy to make a mistake in the asm_arg() code paths.
Comment 27 Jacob Bramley [:jbramley] 2010-03-16 09:36:03 PDT
Hmm. I don't have all the context (and don't have time to read all the history today), but here are a few high-level observations:

 * This doesn't apply to the tip. Indeed, ::nInit hasn't been on line 81 for a long time! I tried both "mozilla-central" and "tracemonkey". (I don't know the repository URL for mozilla1.9.1, though I'll have a look tomorrow.)

 * You replaced "str ip, [sp], #-4" in a few places with "str ip, [sp, #stkd]" (where stkd stores some offset into the stack). That's fine, but isn't stkd positive? That will grow the stack _upwards_, which will create a horrible mess. I must have missed something, else you'll would have seen a crash (unless you were very lucky).

 * If I'm wrong, and stkd is negative, then isn't the "if (stkd > max_out_args)" stuff in asm_call a bit backwards?

I would tend to lean towards using the SP in the way we used to if possible; that's what it's for, after all. I want to see the code in its proper context before giving it r+; as Edwin said, it's very easy to get this stuff wrong and it's also quite possible that you won't notice because existing test cases seem to work!
Comment 28 Edwin Smith 2010-03-16 09:50:58 PDT
The patch is an attempt to fix up the assembly out outgoing stack args in mozilla-1.9.1, to match the way that it's done in nanojit-central, in order to keep sp aligned.  I haven't tried but it should apply to mozilla-1.9.1/js/src/nanojit.
Comment 29 Edwin Smith 2010-03-16 09:52:47 PDT
>  * You replaced "str ip, [sp], #-4" in a few places with "str ip, [sp, #stkd]"
> (where stkd stores some offset into the stack). That's fine, but isn't stkd
> positive? That will grow the stack _upwards_, which will create a horrible
> mess. I must have missed something, else you'll would have seen a crash (unless
> you were very lucky).

the space required for out args is max_out_args, and this is included in the calculation of the frame size in genPrologue().  so, stores of out-args should be at positive offsets from sp, and sp should never move except in the prolog/epilog.
Comment 30 Mike Hommey [:glandium] 2010-03-16 09:54:21 PDT
(In reply to comment #27)
> Hmm. I don't have all the context (and don't have time to read all the history
> today), but here are a few high-level observations:
> 
>  * This doesn't apply to the tip. Indeed, ::nInit hasn't been on line 81 for a
> long time! I tried both "mozilla-central" and "tracemonkey". (I don't know the
> repository URL for mozilla1.9.1, though I'll have a look tomorrow.)
> 
>  * You replaced "str ip, [sp], #-4" in a few places with "str ip, [sp, #stkd]"
> (where stkd stores some offset into the stack). That's fine, but isn't stkd
> positive? That will grow the stack _upwards_, which will create a horrible
> mess. I must have missed something, else you'll would have seen a crash (unless
> you were very lucky).

Actually, as said in comment 22, tracemonkey is never going to call with more than 5 arguments, apparently, which means, at most 1 argument on stack, and obviously, having the stack filled upwards or downwards in this case doesn't change anything.
But now that I take a closer attention to the order in which everything is treated, it looks like the old code was actually not pushing arguments in the right order on stack!

FWIW, 1.9.1 is available on http://hg.mozilla.org/releases/mozilla-1.9.1/
Comment 31 Mike Hommey [:glandium] 2010-03-16 09:56:30 PDT
(In reply to comment #30)
> But now that I take a closer attention to the order in which everything is
> treated, it looks like the old code was actually not pushing arguments in the
> right order on stack!

Ah no, it was in the right order, because the execution is in reverse order compared to the compiler flow. So everything should be fine as it is, if I haven't lost track, but it is easy to lose it.
Comment 32 Mike Hommey [:glandium] 2010-03-16 09:58:26 PDT
(In reply to comment #25)
> Created an attachment (id=432754) [details]
> Patch v2 (different approach)
> 
> This time, using a max_out_args approach, like what is done in 1.9.2 and trunk.
> This fixes both issues the original code had, that is, the ever growing stack
> and the unaligned stack. I haven't run the full test suite with this patch,
> yet.

xpcshell-tests all pass.
Comment 33 Vladimir Vukicevic [:vlad] [:vladv] 2010-03-16 13:15:24 PDT
(In reply to comment #30)
> Actually, as said in comment 22, tracemonkey is never going to call with more
> than 5 arguments

Not sure if you meant "on 1.9.1 only" or not, but tm tip uses more than 5 args in some cases.
Comment 34 Jacob Bramley [:jbramley] 2010-03-17 00:37:50 PDT
Ah, Ok. I think I was half asleep yesterday; I couldn't work out how you allocated extra stack space from the start, but I now see that stackNeeded is used to calculate the initial stack allocation. Pretty obvious.

The ordering is correct, I believe. It looks wrong because you're now incrementing rather than decrementing, but everything is flipped because you've moved it from the backwards-generated code to the easy-to-think-about normal code. If the arguments were in the wrong order, everything would surely fall apart before long.
Comment 35 Jacob Bramley [:jbramley] 2010-03-17 00:39:11 PDT
(In reply to comment #33)
> (In reply to comment #30)
> > Actually, as said in comment 22, tracemonkey is never going to call with more
> > than 5 arguments
> 
> Not sure if you meant "on 1.9.1 only" or not, but tm tip uses more than 5 args
> in some cases.

Even if we didn't, I'd rather not specialize this as it may be required in the future, and trying to fix it then would be a nightmare.
Comment 36 Daniel Veditz [:dveditz] 2010-03-29 10:45:50 PDT
Comment on attachment 432754 [details] [diff] [review]
Patch v2 (different approach)

We're uncomfortable taking this on the 1.9.1 stable branch when it hasn't been fixed on the trunk and 1.9.2 branches, where there's a lot more testing. Dropping approval request for now but will consider it again later.
Comment 37 Mike Hommey [:glandium] 2010-03-29 10:49:53 PDT
(In reply to comment #36)
> (From update of attachment 432754 [details] [diff] [review])
> We're uncomfortable taking this on the 1.9.1 stable branch when it hasn't been
> fixed on the trunk and 1.9.2 branches, where there's a lot more testing.
> Dropping approval request for now but will consider it again later.

In fact, 1.9.2 and trunk are not affected.
Comment 38 Mike Beltzner [:beltzner, not reading bugmail] 2010-04-09 13:22:14 PDT
Comment on attachment 432754 [details] [diff] [review]
Patch v2 (different approach)

a=beltzner for 1.9.1.10
Comment 39 Reed Loden [:reed] (use needinfo?) 2010-04-12 22:45:08 PDT
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/c9adbaf997c9

Note You need to log in before you can comment on or make changes to this bug.