Open Bug 1915272 Opened 1 year ago Updated 2 months ago

Support MParameter in FoldLoadsWithUnbox

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

ASSIGNED

People

(Reporter: iain, Assigned: debadree333)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(1 obsolete file)

The FoldLoadsWithUnbox optimization combines a load from memory with an unbox to generate slightly faster assembly. We could consider expanding it to support MParameter as an input, because parameters are passed in memory (on the stack).

Whiteboard: [sp3]

Hello iain,

I was just trying to understand this, iiuc for every parameter of the function we generate this MParameter node? and we would have to add an additional case here in https://searchfox.org/mozilla-central/source/js/src/jit/IonAnalysis.cpp#4968 and create a corresponding instruction to do both like MLoadElementAndUnbox and lower it somewhat similarly?

Thank you!!

Flags: needinfo?(iireland)

Iain is on parental leave so I'll steal this one: yes that's correct. We would handle MParameter like MLoadFixedSlot et al and add a new MIR instruction that generates more efficient code by folding the load + unbox.

Would be good to compare the codegen before/after for some types (int32, double, object are the interesting cases).

Flags: needinfo?(iireland)

Oh hey! (so sorry for pinging iain didnt know) so basically i looked at a script like this:

function functionWithParams(i, j) {
  const arr = [];
  for (let start = i; start < j; start++) {
    arr.push(start + i);
  }
  return arr;
}

for (let i = 0; i < 1e5; i++) {
  functionWithParams(i, i + 10);
}

i looked at the ion graph and the state of the first block after FoldLoadsWithUnbox looks something like this:

resumepoint 3 3 2 1 0 3 3
0 parameter THIS_SLOT 
1 parameter 0
2 parameter 1
3 constant undefined
4 start
5 checkoverrecursed
6 constant shape at 103165763b80
7 newarrayobject constant6:Shape
8 unbox parameter2 to Int32 (fallible) 
9 guardshape newarrayobject7:Object
memory 4
10 constant object 10316573e0e0 (Array) 
11 guardshape constant10:Object
memory 4
12 slots guardshape11:Object memory 4
57 loaddynamicslotandunbox slots12:Slots (slot 6) 14 unbox parameter1 to Int32 (fallible)
16 constant function push at 103165761430
17 guardspecificfunction loaddynamicslotandunbox57:Object constant16:Object
18 constantproto constant10:Object

so we would ideally want it to look something like where instruction number 8 too look like loadparameterandunbox like we see in instruction 17 for example! right?

Thank you so much for responding!

Flags: needinfo?(jdemooij)

That's correct! For the LoadParameterAndUnbox MIR/LIR instructions, you should look at Parameter and LoadFixedSlot - it will probably have elements from both of those.

Flags: needinfo?(jdemooij)

Hello jandem (sorry for the repeated needinfo's 😅 ) if we look at the implementation of visitLoadFixedSlotAndUnbox (for ex): we load the address based on the base address (which is the object ptr i believe) and the offset as seen here: https://searchfox.org/mozilla-central/source/js/src/jit/CodeGenerator.cpp#17286 , now for the case of LoadParameterAndUnbox we look at its lowering implementation we only pass around the offset value https://searchfox.org/mozilla-central/source/js/src/jit/Lowering.cpp#76 so what i am not able to figure out is what will be the base address in this case? I i tried to look at the implementation of LArgument and LAllocation i think theres a clue here: https://searchfox.org/mozilla-central/source/js/src/jit/LIR.h#92 but couldnt figure :-(

Thank you so much for your time!!

Flags: needinfo?(jdemooij)

We can use the frame pointer register as base, similar to the code here (ignore the BaseRegForAddress::SP case):

https://searchfox.org/mozilla-central/rev/04a47c08504e6357a3164163dd19a47754521204/js/src/jit/shared/CodeGenerator-shared-inl.h#260,264-266

Note that a.toArgument()->index() is a bit confusing: it's the offset of the argument, not the argument's index. That's also clear from the Lowering.cpp code you linked to in comment 5.

Flags: needinfo?(jdemooij)
Assignee: nobody → debadree333
Status: NEW → ASSIGNED

For the record, the motivating testcase for this is in bug 1911160. It looks something like this:

C.costBetween = function(from, to) {
  if (from >= this.costMatrix.length || from == to)
    return 0;
  return this.costMatrix[from][to - from - 1];
}

In particular, disassembling the code we generate for loading from and to, we get something like this (taken from this comment):

    MoveGroup [arg:8 -> r8, x]
           mov r8, qword [rbp + 0x28]
    Unbox (r8)
           mov r11, r8
           shr r11, 0x2f
           cmp r11d, 0x1fff1
           jnz 0x48d794
           mov edi, r8d

Conversely, this.costMatrix generates:

    LoadFixedSlotAndUnbox (rax)
           mov r11, -0x2000000000000
           xor r11, qword [rax + 0x18]
           mov rcx, r11
           shr r11, 0x2f
           jnz 0x48d77f

The latter lets us avoid the cmp by arranging for the shr to set the right flags.

I don't know whether foldLoadsWithUnbox is the best way to accomplish this, or whether we could do the whole optimization during lowering/codegen.

Oh thanks for the explanation iain i will try investigating more and ask any questions i find!

Attachment #9423859 - Attachment is obsolete: true

Worth repriotizing/reassigning/just-fix-it this?

Bumping down in priority; I think the gap we were seeing on bug 1911160 had more to do with bug 1956655.

Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: