Opening a large GDoc is 50% slower in Nightly and spends a lot of time in js::jit::CodePosition::operator>=(js::jit::CodePosition)
Categories
(Core :: JavaScript Engine, task, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox135 | --- | fixed |
People
(Reporter: mayankleoboy1, Assigned: jandem)
References
(Blocks 3 open bugs)
Details
(Keywords: perf-alert)
Attachments
(2 files)
Open the SP3 Planning GDoc
Nightly: https://share.firefox.dev/4ilxsgK
Chrome: https://share.firefox.dev/4iw9srA / https://share.firefox.dev/3ZL85hf
Nightly(1Jul2024): https://share.firefox.dev/3Vsao6o
Maybe something to improve? Feel free to dupe to other bugs
Assignee | ||
Comment 1•2 months ago
|
||
That's a good find. In addInitialRange
we have this call: existing->tryToMoveDefAndUsesInto(merged)
. In this case tryToMoveDefAndUsesInto
ends up moving all uses (we even assert that after the call) so we can probably do something more efficient than tryToMoveDefAndUsesInto
. This might have quadratic behavior actually since it's inserting into a sorted list.
NI myself to take a look when I'm back on Friday.
Assignee | ||
Comment 2•2 months ago
|
||
Btw the root cause here is that for JS code, the vast majority of virtual register 'uses' are KEEPALIVE
uses for snapshots (> 85% on Sp3, more in this GDocs case).
I'll post some patches to optimize tryToMoveDefAndUsesInto
, but we could also move keepalive uses to a separate linked list so that we can skip these entirely in a lot of cases. I'll post a patch for that in another bug but I'm not sure about landing that yet.
Assignee | ||
Comment 3•2 months ago
|
||
In addInitialRange
when merging live ranges, we can concatenate the linked lists
of use positions instead of moving each use separately to the other list.
Updated•2 months ago
|
Assignee | ||
Comment 4•2 months ago
|
||
This could be optimized more in the future but this at least avoids iterating over
all uses that come after the other
live range.
Reporter | ||
Comment 5•2 months ago
|
||
Profile with the try build: https://share.firefox.dev/3ZD7Rr1 . The profile doesnt appear to have changed much.
Assignee | ||
Comment 6•2 months ago
|
||
(In reply to Mayank Bansal from comment #5)
Profile with the try build: https://share.firefox.dev/3ZD7Rr1 . The profile doesnt appear to have changed much.
It's hard to say without symbols, but these patches should reduce the time under js::jit::VirtualRegister::addInitialRange
in your original profile (see the Flame Graph tab in the profiler). What I saw in local profiles was like 350 ms to 20 ms or so just for that part.
There's still slowness elsewhere from having so many 'uses'. I wrote a larger patch that I'll post in a different bug (see comment 2) but I'm not sure about landing that one atm.
Updated•2 months ago
|
https://hg.mozilla.org/mozilla-central/rev/e047f166b078
https://hg.mozilla.org/mozilla-central/rev/178fbe156e60
Reporter | ||
Comment 9•2 months ago
|
||
Profile with hte latest Nightly: https://share.firefox.dev/3Dz847d
Comment 10•2 months ago
|
||
(In reply to Pulsebot from comment #7)
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e047f166b078
part 1 - Optimize moving uses when merging live ranges in addInitialRange.
r=jseward
https://hg.mozilla.org/integration/autoland/rev/178fbe156e60
part 2 - Optimize LiveRange::tryToMoveDefAndUsesInto a bit. r=jseward
Perfherder has detected a browsertime performance change from push 178fbe156e60b6e213bff8e921995cc619ab6984.
Improvements:
Ratio | Test | Platform | Options | Absolute values (old vs new) | Performance Profiles |
---|---|---|---|---|---|
3% | google-docs ContentfulSpeedIndex | linux1804-64-shippable-qr | cold fission webrender | 2,092.82 -> 2,029.38 | Before/After |
3% | google-docs SpeedIndex | linux1804-64-shippable-qr | cold fission webrender | 1,600.52 -> 1,558.27 | Before/After |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.
If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.
You can run these tests on try with ./mach try perf --alert 43166
For more information on performance sheriffing please see our FAQ.
Reporter | ||
Comment 11•1 month ago
|
||
FWIW, the alert has been updated to show a 10% improvement on cpuTime on Windows.
Description
•