Slightly modified paper.js demo (http://paperjs.org/examples/spiral-raster/ ) is 2.1x slower in Nightly (and spends lots of time around GC)
Categories
(Core :: JavaScript Engine, task, P3)
Tracking
()
People
(Reporter: mayankleoboy1, Unassigned)
References
(Blocks 1 open bug, )
Details
Attachments
(1 file)
2.39 KB,
text/plain
|
Details |
Go to http://paperjs.org/examples/spiral-raster/
On the top-right, click on "source'
EITHER replace all the code in the demo with the attached code OR
search for: "length: count / 100" and replace by : "length: count / 1000"
Click on "Run" in the top-right.
Nightly: https://share.firefox.dev/4dF7xhp (13s)
Chrome: https://share.firefox.dev/4g3buhs (6s)
Reporter | ||
Comment 1•3 months ago
|
||
Reporter | ||
Comment 2•3 months ago
|
||
STR2: Replace "angle: count * 5" -- > "angle: count * 50"
Nightly: https://share.firefox.dev/3XnchCq
Comment 3•3 months ago
|
||
If I run this through samply with a local browser build (making both of the changes listed above), I see ~15% in minor GC, but by far the hot spot is this code in _add:
for (var i = index + amount, l = segments.length; i < l; i++)
segments[i]._index = i;
Running with IONPERF=ir, the hot part is simply storing to a dynamic slot:
InterruptCheck
86 mov r11, 0x79dc99735aac
70 cmp dword [r11], 0x0
116 jnz 0x21a504
CompareAndBranch
60 cmp eax, ebx
7 jge 0x21a103
LoadElementAndUnbox
110 mov r11, -0x2000000000000
234 xor r11, qword [rdx + rax * 8]
184 mov rcx, r11
134 shr r11, 0x2f
158 jnz 0x21a532
GuardShape
147 mov r11, 0x25dc7b9f67a0
180 cmp qword [rcx], r11
439 jnz 0x21a53c
Slots
6628 mov rcx, qword [rcx + 0x8]
StoreDynamicSlotT
1305 mov r11, 0x79dc99749a10
145 test dword [r11], 0x1
208 jz 0x21a0f2
1 mov r11, qword [rcx]
304 shr r11, 0x2f
4 cmp r11d, 0x1fff6
jb 0x21a0f2
push rdx
lea rdx, qword [rcx]
call 0xffffffffffec2d24
pop rdx
175 mov dword [rcx], eax
1353 mov dword [rcx + 0x4], -0x78000
AddI
100 add eax, 0x1
Goto
82 jmp 0x21a06f
The single hottest instruction is loading the slots array. The next hottest code is the pre-barrier check and the store itself.
This code all looks pretty good.
Updated•3 months ago
|
Comment 4•20 days ago
|
||
Something something figuring out if we could allocate a larger object to avoid this load ending up in the dynamic slots.
I believe the optimization in bug 1823811 doesn't apply here because the index property is set outside of a constructor function.
Comment 5•20 days ago
|
||
It's not really the point here, but staring at that code makes me think that the prebarrier check could be hoisted out of the loop. It only works because the loop contains no calls or nested loops, but here the only time the prebarrier check results will change is at the loop head via the InterruptCheck.
(Then again, I don't understand why so many samples hit in the prebarrier test in the first place. The load should be from L1 and the branch should be 100% predictable and not taken. Hardware is hard.)
Comment 6•19 days ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
It's not really the point here, but staring at that code makes me think that the prebarrier check could be hoisted out of the loop. It only works because the loop contains no calls or nested loops, but here the only time the prebarrier check results will change is at the loop head via the InterruptCheck.
(Then again, I don't understand why so many samples hit in the prebarrier test in the first place. The load should be from L1 and the branch should be 100% predictable and not taken. Hardware is hard.)
Interesting idea. It's a bit awkward, because in general prebarrier checks are implicit, rather than being exposed as MIR nodes, so it's not hoistable unless we split it into separate nodes. I can imagine an optimization that looks for loops without calls and hoists the prebarrier check outside the loop, but then what do you do when the prebarrier is actually needed? I think you might have to generate two separate copies of the loop and switch between them, which is maybe a bit much.
Description
•