Closed
Bug 1292136
Opened 3 years ago
Closed 3 years ago
Eliminate Unbox:Object opcode that follows a LoadUnboxedPointerV opcode
Categories
(Core :: JavaScript Engine: JIT, defect)
Core
JavaScript Engine: JIT
Not set
Tracking
()
RESOLVED
FIXED
mozilla51
Tracking | Status | |
---|---|---|
firefox51 | --- | fixed |
People
(Reporter: sandervv, Assigned: sandervv, Mentored)
Details
Attachments
(1 file, 2 obsolete files)
1.23 KB,
patch
|
nbp
:
review+
|
Details | Diff | Splinter Review |
While working on improving object group heuristics, we found some sub-optimal Ion code. Consider the small test case below that is reduced from the Bullet benchmark compiled with Cheerp: function __ZN28btHashedOverlappingPairCache26processAllOverlappingPairsEP17btOverlapCallbackP12btDispatcher(Lthis,Lcallback,Ldispatcher){ var tmp0=0,tmp1=null,Lvtable=null,Lcall4=0; tmp1=Lthis.a1.a3; Lvtable=Lcallback.a0; Lvtable=Lvtable.a3; Lcall4=Lvtable(Lcallback,tmp1[tmp0]); } (function() { var global = 0; for (var i = 0; i < 1000; i++) { var Lthis = { a1: { a3: [1], }, }; var Lcallback1 = { a0: { a3: null, a6: function(Lcallback, tmp) { global += tmp }, }, }; var Lcallback2 = { a0: { a3: function(Lcallback, tmp) { global += tmp + tmp }, a4: { }, }, a1: { }, }; Lcallback1.a0.a3 = Lcallback1.a0.a6; var Ldispatcher = null; for (var j = 0; j < 100; j++) { var Lcallback = j < 50 ? Lcallback1 : Lcallback2; __ZN28btHashedOverlappingPairCache26processAllOverlappingPairsEP17btOverlapCallbackP12btDispatcher(Lthis, Lcallback, Ldispatcher); } } print(global); })(); After inlining the function defined at line 1 into the inner loop body, Ion generates the following code for the inner loop body: BB #10 [00240,35,12] [inlined /home/smvv/work/leaningtech/tests/loadunboxedpointerv-unboxobject.js:1] -> #12 -> #15 :: 84700 hits [LoadUnboxedPointerT] movq 0x10(%rax), %rcx [LoadUnboxedPointerT] movq 0x10(%rcx), %rdi [LoadUnboxedPointerT] movq 0x10(%rbx), %rcx [LoadUnboxedPointerV] movq 0x10(%rcx), %r8 testq %r8, %r8 jne .Lfrom2391 movabsq $0xfffc000000000000, %r8 jmp .Lfrom2406 .set .Llabel2406, . .set .Lfrom2391, .Llabel2406 movabsq $0xfffe000000000000, %r11 orq %r11, %r8 .set .Llabel2419, . .set .Lfrom2406, .Llabel2419 [Unbox:Object] movq %r8, %r11 shrq $47, %r11 cmpl $0x1fffc, %r11d jne .Lfrom2439 movabsq $0x7fffffffffff, %r11 andq %r11, %r8 [Elements] movq 0x18(%rdi), %rcx [InitializedLength] movl -0xc(%rcx), %r9d [BoundsCheck] testl %r9d, %r9d jbe .Lfrom2469 [LoadElementT] movl 0x0(%rcx), %r9d [FunctionDispatch] movabsq $0x7fe89a66e820, %r11 cmpq %r11, 0x0(%r8) je .Lfrom2491 jmp .Lfrom2496 Ion generates a LoadUnboxedPointerV opcode followed by a Unbox:Object opcode. First boxing the value followed by unboxing that value is sub-optimal. The attached patch is necessary in order to create an unboxed representation of objects that could potentially become boxed again in the future due to added properties.
Comment 1•3 years ago
|
||
Comment on attachment 8777801 [details] [diff] [review] disable-heuristic-in-tryconverttounboxedlayout.patch Review of attachment 8777801 [details] [diff] [review]: ----------------------------------------------------------------- Brian, do you recall exactly what might be the side-effect of using unboxed object if we have only one object?
Attachment #8777801 -
Flags: feedback?(bhackett1024)
Assignee | ||
Comment 2•3 years ago
|
||
To give an idea how many times this opcode pair is hit in Bullet: LoadUnboxedPointerV followed by Unbox 821,782,125 That estimate is a summation of the opcode pair count multiplied by the hit count listed in the basic block header that the pairs belong to. Multiplying that number by 10 instructions (that could be removed when these opcodes are optimized), there are 8,210,000,000 instructions executed that are not needed.
Comment 3•3 years ago
|
||
Comment on attachment 8777801 [details] [diff] [review] disable-heuristic-in-tryconverttounboxedlayout.patch Review of attachment 8777801 [details] [diff] [review]: ----------------------------------------------------------------- The only negative effect that removing this heuristic could have is that we might end up converting unboxed objects back to natives later on, which will end up in worse code being generated when the unboxed object is accessed. It might be a good thing in general to remove this heuristic, but the heuristic doesn't have anything to do with the codegen problem in this bug. AFAICT the LoadUnboxedPointerV being followed by an Unbox here only serves to filter out null values, so it should be possible to generate much better code here given the same type/unboxed information. If I had to guess the MIR which was lowered into the LoadUnboxedPointerV has MIRType_Value, when it would be better off having MIRType_ObjectOrNull.
Attachment #8777801 -
Flags: feedback?(bhackett1024) → feedback-
Assignee | ||
Comment 4•3 years ago
|
||
Prior to this patch: BB #10 [00242,35,12] [inlined loadunboxedpointerv-unbox.js:1] -> #12 -> #15 :: 9982000 hits [LoadFixedSlotT] movabsq $0x7fffffffffff, %rdi andq 0x20(%rax), %rdi [LoadFixedSlotT] movabsq $0x7fffffffffff, %rcx andq 0x20(%rbx), %rcx [LoadFixedSlotV] movq 0x20(%rcx), %r8 [Unbox:Object] movq %r8, %r11 shrq $47, %r11 cmpl $0x1fffc, %r11d jne .Lfrom2623 movabsq $0x7fffffffffff, %r11 andq %r11, %r8 [Elements] movq 0x18(%rdi), %rcx [InitializedLength] movl -0xc(%rcx), %r9d [BoundsCheck] testl %r9d, %r9d jbe .Lfrom2653 [LoadElementT] movl 0x0(%rcx), %r9d [FunctionDispatch] movabsq $0x7ff95846e7f0, %r11 cmpq %r11, 0x0(%r8) je .Lfrom2675 jmp .Lfrom2680 With this patch applied: BB #10 [00242,35,12] [inlined loadunboxedpointerv-unbox.js:1] -> #12 -> #15 :: 9981600 hits [LoadUnboxedPointerT] movq 0x10(%rax), %rcx [LoadUnboxedPointerT] movq 0x10(%rcx), %rdi [LoadUnboxedPointerT] movq 0x10(%rbx), %rcx [LoadUnboxedPointerT] movq 0x10(%rcx), %r8 testq %r8, %r8 je .Lfrom2367 [Elements] movq 0x18(%rdi), %rcx [InitializedLength] movl -0xc(%rcx), %r9d [BoundsCheck] testl %r9d, %r9d jbe .Lfrom2384 [LoadElementT] movl 0x0(%rcx), %r9d [FunctionDispatch] movabsq $0x7f008d26e7f0, %r11 cmpq %r11, 0x0(%r8) je .Lfrom2406 jmp .Lfrom2411 Note that there are three loads prior to the patch, while there are four loads with the patch applied. The last two loads are used for the function pointer. I assume that the difference in the first one/two loads is caused by some code that is merging Lthis.a1.a3 into Lthis.a3, but that is not done for unboxed objects. Collapsing Lthis.a1.a3 is possible because Lthis.a1 is never used. What do you think? This patch reduces Cheerp's bullet test case runtime with 1.2 seconds. Try job results: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aa12fde6c314 I do not see failures that are related to this patch (there are leaks and flaky failures).
Assignee: nobody → sandervv
Attachment #8777801 -
Attachment is obsolete: true
Attachment #8789332 -
Flags: review?(nicolas.b.pierron)
Comment 5•3 years ago
|
||
Do we need the modification of the heuristic to have this improvement? From what I understand, if we remove the heuristics about skipping on cases where the object count is <= 1, then we might actually convert any new object with TryConvertToUnboxedLayout, even within the interpreter, which definitely sounds as a valid concern. How negative is this? My guess would be for us to ensure that the object is used and not mutated before we attempt to optimize it in such manners. Would it be possible to add a new shape header / object property / type object info to record the number of accesses since we mutated the shape? In such cases, we would have a valid heuristics which implies that one kind of object is effectively manipulated before we attempt at optimizing its memory representation. (In reply to Sander Mathijs van Veen from comment #4) > Note that there are three loads prior to the patch, while there are four > loads with the patch applied. The last two loads are used for the function > pointer. I assume that the difference in the first one/two loads is caused > by some code that is merging Lthis.a1.a3 into Lthis.a3, but that is not done > for unboxed objects. Collapsing Lthis.a1.a3 is possible because Lthis.a1 is > never used. What do you think? I would bet on Scalar Replacement, which kicks in and removes Lthis allocation, and replace LThis.a1 property by a scalar. Normally, we should as well be able to remove a1's allocation as well as a3 allocation. This issue might be caused by Scalar Replacement traversals, which is ordering the visit in the wrong order compared to the initialization.
Flags: needinfo?(sandervv)
Flags: needinfo?(bhackett1024)
Assignee | ||
Comment 6•3 years ago
|
||
Removed the |objectCount <= 1| part from the patch. Only the first part that fixes the LoadUnboxedPointerV / Unbox opcode pair remains.
Attachment #8789332 -
Attachment is obsolete: true
Attachment #8789332 -
Flags: review?(nicolas.b.pierron)
Flags: needinfo?(sandervv)
Attachment #8790233 -
Flags: review?(nicolas.b.pierron)
Comment 7•3 years ago
|
||
Comment on attachment 8790233 [details] [diff] [review] bug1292136-loadunboxedpointerv-with-unbox-opcode.patch Review of attachment 8790233 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! :)
Attachment #8790233 -
Flags: review?(nicolas.b.pierron) → review+
Updated•3 years ago
|
Flags: needinfo?(bhackett1024)
Assignee | ||
Updated•3 years ago
|
Keywords: checkin-needed
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/876a4319c262 Eliminate Unbox:Object opcode that follows a LoadUnboxedPointerV opcode. r=nbp
Keywords: checkin-needed
Comment 9•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/876a4319c262
Status: NEW → RESOLVED
Closed: 3 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in
before you can comment on or make changes to this bug.
Description
•