Slim down JIT codegen for DOM bindings a bit

NEW
Unassigned

Status

()

Core
JavaScript Engine
5 years ago
3 years ago

People

(Reporter: bz, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

Created attachment 688095 [details]
ASM as output by the JIT profiler

I ran this testcase:

  var count = 1000000;
  var xhr = new XMLHttpRequest;
  for (var i = 0; i < count; ++i) {
    xhr.readyState;
  }

through the JIT profiler and looked at the resulting asm.  There are various things in there that are bit suboptimal.  I'll comment on those when I attach the annotated asm.
Created attachment 688098 [details]
Annotated asm

So I see the following things that could use work:

1)  Before we ever get to the GetDOMProperty there is a useless load of whatever
    object pointer we did the shape guard on (the proto?) into %rax.  What's
    that about?
2)  We end up loading JSRuntime* into registers twice: once when saving
    rt->ionTop, the other when getting rt->ionJSContext.  Not sure how to best
    deal with that, given that the ionTop thing is somewhere inside frame setup
    goop.
3)  We put all of our call args in scratch registers, then move them into the
    argument registers.  We should just put them in the right registers to
    start with.
4)  We unbox, and guard on, the completely unused return value.  Not sure how
    important this is to fix since presumably real code would rarely not use a
    return value.

I'll file a separate bug on #3, since I understand that one best...
Depends on: 817943
I looked at the asm for a method call too, and it has two separate call instructions... not sure whether that's expected?
(In reply to Boris Zbarsky (:bz) from comment #1)
> Created attachment 688098 [details]
> Annotated asm
> 
> So I see the following things that could use work:
> 
> 1)  Before we ever get to the GetDOMProperty there is a useless load of
> whatever
>     object pointer we did the shape guard on (the proto?) into %rax.  What's
>     that about?

The guard is produced by IonBuilder::TestCommonPropFunc, and put as a dependency of  MGetDOMProperty such as it would be a dependency of the call.  The bug might come from the fact that the MIR dependency is not lower to a not-used LUse, during the lowering phase and the register allocator might be lost and generate weird things.

> [GetDOMProperty]
> subq       $0x8, %rsp                   #bz Make stack space for jsval retval
> movq       %rsp, %rcx                   #bz Put pointer to retval in %rcx
> push       %rdi                         #bz Pushed "xhr" object on stack.
> movq       0x20(%rdi), %rbx             #bz Read first fixed slot from "xhr"
> shlq       $1, %rbx                     #bz Getting private on 64-bit shifts
> movq       %rsp, %rdi                   #bz Put pointer to xhr object in %rdi
> movabsq    $0xffffffffffffffff, %rax    #bz Stick -1 in %rax
> pushl      $0xc0                        #bz Push 0xc0 = 192? Why?

This is the frame descriptor, we need it to walk the stack.

> push       %rax                         #bz Push -1?  Why?

This is the emulated return address, -1 is just the placeholder generated which will be patched at the link phase.  This fake return address is used to index the safepoint which is used for marking live objects which are in the current frame.

> movabsq    $0x107dc5000, %r11           #bz Put JSRuntime* in %r11
> movq       %rsp, 0x34a28(%r11)          #bz save stack pointer in rt->ionTop

This is used as a start point for stack iteration from the current IonActivation.  rt->ionTop is used as an easy thread-safe way to hold the last stack pointer.

> movabsq    $0x1, %r11                   #bz Stick 1 in %r11
> push       %r11                         #bz Push 1
> movabsq    $0x0, %r11                   #bz Stick 0 in %r11
> push       %r11                         #bz Push 0

These 2 are used to identify the kind of exit frame.  In this case an ION_FRAME_DOMGETTER.

> movq       %rsp, %rax                   #bz Save stack pointer in %rax
> andq       $0xfffffff0, %rsp            #bz align stack?  32-bit?
> push       %rax                         #bz Push pre-alignment %rsp on stack

This is made by setupUnalignedABICall, we need to align the stack such as the call frame would be aligned, to respect some compiler restrictions.  In practice, we might be able to get rid of them in most cases as we statically know our frame size.  But this would imply that we keep it aligned for every-call which might involve some tricky manipulation if funapply and also in the underflow case.

> 2)  We end up loading JSRuntime* into registers twice: once when saving
>     rt->ionTop, the other when getting rt->ionJSContext.  Not sure how to
> best
>     deal with that, given that the ionTop thing is somewhere inside frame
> setup
>     goop.

> movabsq    $0x107dc5000, %rax           #bz Put JSRuntime* in %rax again, BUG
> movq       0x34a30(%rax), %rax          #bz Put rt->ionJSContext in %rax

This is the first argument of the function.  This will also appear with callVM.  We should update linkExitFrame to accept a register containing a JSRuntime*.  I am not sure but we have to be careful in VMwrapper on x86 as we might run out of register quickly.

> subq       $0x8, %rsp                   #bz Re-align stack to 16 bytes


> 3)  We put all of our call args in scratch registers, then move them into the
>     argument registers.  We should just put them in the right registers to
>     start with.
> 
> I'll file a separate bug on #3, since I understand that one best...

> movq       %rcx, %rcx                   #bz Call setup arg 4: jsval*, BUG
> movq       %rbx, %rdx                   #bz Call setup arg 3: nsXMLHttpRequest*
> movq       %rdi, %rsi                   #bz Call setup arg 2: JSHandleObject
> movq       %rax, %rdi                   #bz Call setup arg 1: JScontext*

Hum … apparently the CallTempReg* are not set correctly.  This should have been 4 no-op plus a knonw bug from the MoveGroup which is still producing these no-op.  This should be an easy fix.

> movabsq    $0x10280b8f0, %rax           #bz Target address into %rax
> call       *%rax                        #bz Do the call
> addq       $0x8, %rsp                   #bz Remove stack space we made above
> pop        %rsp                         #bz pop pre-alignment %rsp

> movq       0x28(%rsp), %rcx             #bz Read return value into %rcx
> addq       $0x30, %rsp                  #bz Pop everything off the stack



> 4)  We unbox, and guard on, the completely unused return value.  Not sure how
>     important this is to fix since presumably real code would rarely not use
> a
>     return value.

We have to because TI does not trust DOM functions to always return the same value type.
Nicolas, thank you for the analysis!  Are you willing to file a bug on #1 above, since you seem to understand it at least somewhat?

> We should update linkExitFrame to accept a register containing a JSRuntime*.

Do you want to file the bug, or should I?

> Hum … apparently the CallTempReg* are not set correctly. 

Let's do this part in bug 817943.

> We have to because TI does not trust DOM functions to always return the same value type.

Well, in the unused return value case we don't care whether it returned the same value type...

But this raises an interesting point in general which I should have thought of last night.  Filed bug 818050.
Depends on: 818050
Summary: Slim down DOM codegen for DOM bindings a bit → Slim down JIT codegen for DOM bindings a bit
Blocks: 580070
(In reply to Nicolas B. Pierron [:pierron] [:nbp] from comment #3)
> > movabsq    $0xffffffffffffffff, %rax    #bz Stick -1 in %rax
> > push       %rax                         #bz Push -1?  Why?
> > movabsq    $0x1, %r11                   #bz Stick 1 in %r11
> > push       %r11                         #bz Push 1
> > movabsq    $0x0, %r11                   #bz Stick 0 in %r11
> > push       %r11                         #bz Push 0

Yuck.  We can do better than this.  And we do, for pushing the frame descriptor; why don't we use |push imm| in these instances, too?
> Yuck.  We can do better than this.

Filed bug 818138.
Depends on: 818138
Depends on: 818199
Do we need a separate bug on this part:

> movabsq    $0xffffffffffffffff, %rax    #bz Stick -1 in %rax
> pushl      $0xc0                        #bz Push 0xc0 = 192? Why?
> push       %rax                         #bz Push -1?  Why?

to look more like this:

  pushl      $0xc0
  pushl      $-1

?
Or perhaps it would have to be:

  pushl $0xc0
  pushl $0xffffffff

Note that on 32-bit that snippet looks like so:

  movl       $0xffffffff, %edi
  pushl      $0x460
  push       %edi

which is no better, imo...
(In reply to Boris Zbarsky (:bz) from comment #8)
> Or perhaps it would have to be:
> 
>   pushl $0xc0
>   pushl $0xffffffff
> 
> Note that on 32-bit that snippet looks like so:
> 
>   movl       $0xffffffff, %edi
>   pushl      $0x460
>   push       %edi
> 
> which is no better, imo...

No better than what?

We could educate 32-bit (and 64-bit, really) about the |push imm8/imm16| variants, which also do sign-extension and are shorter.
Oh, I see.  We later patch the 0xffffffff with the actual address, so we can't treat it as a 32-bit immediate.  Alright.
> No better than what?

No better than the 64-bit codegen.

So for 32-bit, we might be be able to do a pushl anyway, since pointers would fit in 32 bits there.  But that might complicate the later patching...
Ah, OK.  Probably easier to just keep the mov $imm, reg form for both architectures, then.
(Assignee)

Updated

3 years ago
Assignee: general → nobody
You need to log in before you can comment on or make changes to this bug.