Closed Bug 1936209 Opened 2 months ago Closed 2 months ago

Opening a large GDoc is 50% slower in Nightly and spends a lot of time in js::jit::CodePosition::operator>=(js::jit::CodePosition)

Categories

(Core :: JavaScript Engine, task, P1)

task

Tracking

()

RESOLVED FIXED
135 Branch
Tracking Status
firefox135 --- fixed

People

(Reporter: mayankleoboy1, Assigned: jandem)

References

(Blocks 3 open bugs)

Details

(Keywords: perf-alert)

Attachments

(2 files)

Open the SP3 Planning GDoc

Nightly: https://share.firefox.dev/4ilxsgK
Chrome: https://share.firefox.dev/4iw9srA / https://share.firefox.dev/3ZL85hf
Nightly(1Jul2024): https://share.firefox.dev/3Vsao6o

Maybe something to improve? Feel free to dupe to other bugs

That's a good find. In addInitialRange we have this call: existing->tryToMoveDefAndUsesInto(merged). In this case tryToMoveDefAndUsesInto ends up moving all uses (we even assert that after the call) so we can probably do something more efficient than tryToMoveDefAndUsesInto. This might have quadratic behavior actually since it's inserting into a sorted list.

NI myself to take a look when I'm back on Friday.

Flags: needinfo?(jdemooij)

Btw the root cause here is that for JS code, the vast majority of virtual register 'uses' are KEEPALIVE uses for snapshots (> 85% on Sp3, more in this GDocs case).

I'll post some patches to optimize tryToMoveDefAndUsesInto, but we could also move keepalive uses to a separate linked list so that we can skip these entirely in a lot of cases. I'll post a patch for that in another bug but I'm not sure about landing that yet.

Flags: needinfo?(jdemooij)

In addInitialRange when merging live ranges, we can concatenate the linked lists
of use positions instead of moving each use separately to the other list.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED

This could be optimized more in the future but this at least avoids iterating over
all uses that come after the other live range.

Profile with the try build: https://share.firefox.dev/3ZD7Rr1 . The profile doesnt appear to have changed much.

(In reply to Mayank Bansal from comment #5)

Profile with the try build: https://share.firefox.dev/3ZD7Rr1 . The profile doesnt appear to have changed much.

It's hard to say without symbols, but these patches should reduce the time under js::jit::VirtualRegister::addInitialRange in your original profile (see the Flame Graph tab in the profiler). What I saw in local profiles was like 350 ms to 20 ms or so just for that part.

There's still slowness elsewhere from having so many 'uses'. I wrote a larger patch that I'll post in a different bug (see comment 2) but I'm not sure about landing that one atm.

Severity: -- → N/A
Priority: -- → P1
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e047f166b078 part 1 - Optimize moving uses when merging live ranges in addInitialRange. r=jseward https://hg.mozilla.org/integration/autoland/rev/178fbe156e60 part 2 - Optimize LiveRange::tryToMoveDefAndUsesInto a bit. r=jseward
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 135 Branch

Profile with hte latest Nightly: https://share.firefox.dev/3Dz847d

See Also: → 1229065
Blocks: 1938512

(In reply to Pulsebot from comment #7)

Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e047f166b078
part 1 - Optimize moving uses when merging live ranges in addInitialRange.
r=jseward
https://hg.mozilla.org/integration/autoland/rev/178fbe156e60
part 2 - Optimize LiveRange::tryToMoveDefAndUsesInto a bit. r=jseward

Perfherder has detected a browsertime performance change from push 178fbe156e60b6e213bff8e921995cc619ab6984.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
3% google-docs ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 2,092.82 -> 2,029.38 Before/After
3% google-docs SpeedIndex linux1804-64-shippable-qr cold fission webrender 1,600.52 -> 1,558.27 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 43166

For more information on performance sheriffing please see our FAQ.

Keywords: perf-alert
See Also: 1229065

FWIW, the alert has been updated to show a 10% improvement on cpuTime on Windows.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: