Closed Bug 1292136 Opened 3 years ago Closed 3 years ago

Eliminate Unbox:Object opcode that follows a LoadUnboxedPointerV opcode

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: sandervv, Assigned: sandervv, Mentored)

Details

Attachments

(1 file, 2 obsolete files)

While working on improving object group heuristics, we found some sub-optimal Ion code. Consider the small test case below that is reduced from the Bullet benchmark compiled with Cheerp:

function __ZN28btHashedOverlappingPairCache26processAllOverlappingPairsEP17btOverlapCallbackP12btDispatcher(Lthis,Lcallback,Ldispatcher){
    var tmp0=0,tmp1=null,Lvtable=null,Lcall4=0;
    tmp1=Lthis.a1.a3;
    Lvtable=Lcallback.a0;
    Lvtable=Lvtable.a3;
    Lcall4=Lvtable(Lcallback,tmp1[tmp0]);
}

(function() {
    var global = 0;

    for (var i = 0; i < 1000; i++) {
        var Lthis = {
            a1: {
                a3: [1],
            },
        };
        var Lcallback1 = {
            a0: {
                a3: null,
                a6: function(Lcallback, tmp) { global += tmp },
            },
        };
        var Lcallback2 = {
            a0: {
                a3: function(Lcallback, tmp) { global += tmp + tmp },
                a4: { },
            },
            a1: { },
        };
        Lcallback1.a0.a3 = Lcallback1.a0.a6;
        var Ldispatcher = null;
        for (var j = 0; j < 100; j++) {
            var Lcallback = j < 50 ? Lcallback1 : Lcallback2;
            __ZN28btHashedOverlappingPairCache26processAllOverlappingPairsEP17btOverlapCallbackP12btDispatcher(Lthis, Lcallback, Ldispatcher);
        }
    }

    print(global);
})();

After inlining the function defined at line 1 into the inner loop body, Ion generates the following code for the inner loop body:

BB #10 [00240,35,12] [inlined /home/smvv/work/leaningtech/tests/loadunboxedpointerv-unboxobject.js:1] -> #12 -> #15 :: 84700 hits
[LoadUnboxedPointerT]
movq       0x10(%rax), %rcx
[LoadUnboxedPointerT]
movq       0x10(%rcx), %rdi
[LoadUnboxedPointerT]
movq       0x10(%rbx), %rcx
[LoadUnboxedPointerV]
movq       0x10(%rcx), %r8
testq      %r8, %r8
jne        .Lfrom2391
movabsq    $0xfffc000000000000, %r8
jmp        .Lfrom2406
.set .Llabel2406, .
.set .Lfrom2391, .Llabel2406
movabsq    $0xfffe000000000000, %r11
orq        %r11, %r8
.set .Llabel2419, .
.set .Lfrom2406, .Llabel2419
[Unbox:Object]
movq       %r8, %r11
shrq       $47, %r11
cmpl       $0x1fffc, %r11d
jne        .Lfrom2439
movabsq    $0x7fffffffffff, %r11
andq       %r11, %r8
[Elements]
movq       0x18(%rdi), %rcx
[InitializedLength]
movl       -0xc(%rcx), %r9d
[BoundsCheck]
testl      %r9d, %r9d
jbe        .Lfrom2469
[LoadElementT]
movl       0x0(%rcx), %r9d
[FunctionDispatch]
movabsq    $0x7fe89a66e820, %r11
cmpq       %r11, 0x0(%r8)
je         .Lfrom2491
jmp        .Lfrom2496

Ion generates a LoadUnboxedPointerV opcode followed by a Unbox:Object opcode. First boxing the value followed by unboxing that value is sub-optimal. 

The attached patch is necessary in order to create an unboxed representation of objects that could potentially become boxed again in the future due to added properties.
Comment on attachment 8777801 [details] [diff] [review]
disable-heuristic-in-tryconverttounboxedlayout.patch

Review of attachment 8777801 [details] [diff] [review]:
-----------------------------------------------------------------

Brian, do you recall exactly what might be the side-effect of using unboxed object if we have only one object?
Attachment #8777801 - Flags: feedback?(bhackett1024)
To give an idea how many times this opcode pair is hit in Bullet:
LoadUnboxedPointerV followed by Unbox 	821,782,125

That estimate is a summation of the opcode pair count multiplied by the hit count listed in the basic block header that the pairs belong to.

Multiplying that number by 10 instructions (that could be removed when these opcodes are optimized), there are 8,210,000,000 instructions executed that are not needed.
Comment on attachment 8777801 [details] [diff] [review]
disable-heuristic-in-tryconverttounboxedlayout.patch

Review of attachment 8777801 [details] [diff] [review]:
-----------------------------------------------------------------

The only negative effect that removing this heuristic could have is that we might end up converting unboxed objects back to natives later on, which will end up in worse code being generated when the unboxed object is accessed.  It might be a good thing in general to remove this heuristic, but the heuristic doesn't have anything to do with the codegen problem in this bug.  AFAICT the LoadUnboxedPointerV being followed by an Unbox here only serves to filter out null values, so it should be possible to generate much better code here given the same type/unboxed information.  If I had to guess the MIR which was lowered into the LoadUnboxedPointerV has MIRType_Value, when it would be better off having MIRType_ObjectOrNull.
Attachment #8777801 - Flags: feedback?(bhackett1024) → feedback-
Prior to this patch:

BB #10 [00242,35,12] [inlined loadunboxedpointerv-unbox.js:1] -> #12 -> #15 :: 9982000 hits
[LoadFixedSlotT]
movabsq    $0x7fffffffffff, %rdi
andq       0x20(%rax), %rdi
[LoadFixedSlotT]
movabsq    $0x7fffffffffff, %rcx
andq       0x20(%rbx), %rcx
[LoadFixedSlotV]
movq       0x20(%rcx), %r8
[Unbox:Object]
movq       %r8, %r11
shrq       $47, %r11
cmpl       $0x1fffc, %r11d
jne        .Lfrom2623
movabsq    $0x7fffffffffff, %r11
andq       %r11, %r8
[Elements]
movq       0x18(%rdi), %rcx
[InitializedLength]
movl       -0xc(%rcx), %r9d
[BoundsCheck]
testl      %r9d, %r9d
jbe        .Lfrom2653
[LoadElementT]
movl       0x0(%rcx), %r9d
[FunctionDispatch]
movabsq    $0x7ff95846e7f0, %r11
cmpq       %r11, 0x0(%r8)
je         .Lfrom2675
jmp        .Lfrom2680

With this patch applied:

BB #10 [00242,35,12] [inlined loadunboxedpointerv-unbox.js:1] -> #12 -> #15 :: 9981600 hits
[LoadUnboxedPointerT]
movq       0x10(%rax), %rcx
[LoadUnboxedPointerT]
movq       0x10(%rcx), %rdi
[LoadUnboxedPointerT]
movq       0x10(%rbx), %rcx
[LoadUnboxedPointerT]
movq       0x10(%rcx), %r8
testq      %r8, %r8
je         .Lfrom2367
[Elements]
movq       0x18(%rdi), %rcx
[InitializedLength]
movl       -0xc(%rcx), %r9d
[BoundsCheck]
testl      %r9d, %r9d
jbe        .Lfrom2384
[LoadElementT]
movl       0x0(%rcx), %r9d
[FunctionDispatch]
movabsq    $0x7f008d26e7f0, %r11
cmpq       %r11, 0x0(%r8)
je         .Lfrom2406
jmp        .Lfrom2411

Note that there are three loads prior to the patch, while there are four loads with the patch applied. The last two loads are used for the function pointer. I assume that the difference in the first one/two loads is caused by some code that is merging Lthis.a1.a3 into Lthis.a3, but that is not done for unboxed objects. Collapsing Lthis.a1.a3 is possible because Lthis.a1 is never used. What do you think?

This patch reduces Cheerp's bullet test case runtime with 1.2 seconds.

Try job results: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa12fde6c314
I do not see failures that are related to this patch (there are leaks and flaky failures).
Assignee: nobody → sandervv
Attachment #8777801 - Attachment is obsolete: true
Attachment #8789332 - Flags: review?(nicolas.b.pierron)
Do we need the modification of the heuristic to have this improvement?

From what I understand, if we remove the heuristics about skipping on cases where the object count is <= 1, then we might actually convert any new object with TryConvertToUnboxedLayout, even within the interpreter, which definitely sounds as a valid concern.   How negative is this?

My guess would be for us to ensure that the object is used and not mutated before we attempt to optimize it in such manners.  Would it be possible to add a new shape header / object property / type object info to record the number of accesses since we mutated the shape?

In such cases, we would have a valid heuristics which implies that one kind of object is effectively manipulated before we attempt at optimizing its memory representation.

(In reply to Sander Mathijs van Veen from comment #4)
> Note that there are three loads prior to the patch, while there are four
> loads with the patch applied. The last two loads are used for the function
> pointer. I assume that the difference in the first one/two loads is caused
> by some code that is merging Lthis.a1.a3 into Lthis.a3, but that is not done
> for unboxed objects. Collapsing Lthis.a1.a3 is possible because Lthis.a1 is
> never used. What do you think?

I would bet on Scalar Replacement, which kicks in and removes Lthis allocation, and replace LThis.a1 property by a scalar.  Normally, we should as well be able to remove a1's allocation as well as a3 allocation.  This issue might be caused by Scalar Replacement traversals, which is ordering the visit in the wrong order compared to the initialization.
Flags: needinfo?(sandervv)
Flags: needinfo?(bhackett1024)
Removed the |objectCount <= 1| part from the patch. Only the first part that fixes the LoadUnboxedPointerV / Unbox opcode pair remains.
Attachment #8789332 - Attachment is obsolete: true
Attachment #8789332 - Flags: review?(nicolas.b.pierron)
Flags: needinfo?(sandervv)
Attachment #8790233 - Flags: review?(nicolas.b.pierron)
Comment on attachment 8790233 [details] [diff] [review]
bug1292136-loadunboxedpointerv-with-unbox-opcode.patch

Review of attachment 8790233 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! :)
Attachment #8790233 - Flags: review?(nicolas.b.pierron) → review+
Flags: needinfo?(bhackett1024)
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/876a4319c262
Eliminate Unbox:Object opcode that follows a LoadUnboxedPointerV opcode. r=nbp
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/876a4319c262
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.