Consider adding a 4th tier using Ion optimization levels

RESOLVED FIXED in Firefox 67

Status

()

enhancement
P3
normal
RESOLVED FIXED
2 years ago
4 days ago

People

(Reporter: jandem, Assigned: jandem)

Tracking

(Depends on 1 bug, Blocks 3 bugs)

unspecified
mozilla68
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox67 fixed, firefox68 fixed)

Details

Attachments

(8 attachments, 1 obsolete attachment)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
(Assignee)

Description

2 years ago
Posted patch Hackish prototype (obsolete) — Splinter Review
Scripts are currently Ion-compiled when their use count reaches ~1000. Ion does pretty aggressive inlining of scripts, this is necessary for really hot code (like the first 5 Octane tests). On workloads like Speedometer, this inlining is not helping as much and compilation time is an issue.

I'm attaching a hackish prototype to use the optimization level infrastructure Hannes added (I had to fix some regressions and bugs but it mostly worked). With this patch we have 2 Ion tiers: we still compile scripts with use count 1000 but we only inline 1-2 levels deep and don't inline large functions. We also include use count checks and when a script's use count reaches 100,000 we recompile and use the heuristics we have now.

It works surprisingly well. Octane is mostly unaffected or becomes a bit faster, Sunspider and Kraken regress by a few %, and Speedometer wins at least a few points on my machine (not sure how much because it's pretty noisy).

I'm not sure if this is worth pursuing now as this stuff is hard to get right and I think there's lower-hanging fruit still.

Comment 1

2 years ago
Similar to bug 1080776?
(Assignee)

Comment 2

2 years ago
(In reply to Guilherme Lima from comment #1)
> Similar to bug 1080776?

It's similar but a bit different: the approach there is to add a tier between Baseline and current Ion (at warmup count 100) whereas the patch here "delays" full-Ion. In general I don't think Ion-compiling with warmup count 100 (that bug) is a good idea.
Comment on attachment 8888286 [details] [diff] [review]
Hackish prototype

Review of attachment 8888286 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/IonBuilder.cpp
@@ +5421,4 @@
>          MRecompileCheck* check =
>              MRecompileCheck::New(alloc(), target->nonLazyScript(),
>                                   optimizationInfo().inliningRecompileThreshold(),
> +                                 MRecompileCheck::RecompileCheckType::Inlining);

The problem is that the more we inline the more we would be likely to invalidate the compilation because of inlining issues.

I suggest we use a larger threshold factor than 4, or with an exponentiation by the inlining depth, such that we do not invalidate the code too frequently.
Attachment #8888286 - Flags: feedback+
When Apple added their LLVM-tier JIT, they found that the performance improvements from a 4th tier mostly came from better inlining information.

Although this design would be valuable by itself, it would be even better to have tier-4 Ion able to read type information from tier-3 Ion's ICs.
Priority: -- → P3
(Assignee)

Updated

11 months ago
Depends on: 1467496
(Assignee)

Updated

7 months ago
Blocks: 1490847
(Assignee)

Comment 5

a month ago

I've been playing with this a bit and it's still a significant win on Speedometer (> 5%), Google Docs (5-10%) etc. SunSpider, Kraken, Octane are either unaffected or show an improvement.

This only tunes inlining options - we could probably get more speedups by making more IonBuilder code or expensive optimization passes run only at the highest optimization level.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED

Just as a thought, I have been wondering whether it would be possible to use AFL-style branch tuples (http://lcamtuf.coredump.cx/afl/technical_details.txt) in the Baseline ICs to generate an implicit trace.

If we had something like that, then we would be able to make separate trace trees by type/shape, receiving the benefits of inlining even in polymorphic code.

I believe the answer is yes, but it's quite a lot of work.

Depends on: 1536612
(Assignee)

Updated

a month ago
Attachment #8888286 - Attachment is obsolete: true
(Assignee)

Comment 7

a month ago

We used to have a different threshold for small functions but now they're both
set to 1000 so there's no need to special-case small functions.

(Assignee)

Comment 8

a month ago

Storing this also in IonOptimizationLevels.h/cpp is more complicated than
necessary.

Depends on D24153

(Assignee)

Comment 9

a month ago

There's a lot of complexity around setting/unsetting the eagerCompilation flag.
It's simpler to determine this based on the warm-up threshold being 0.

The patch also fixes some jit-tests where this patch would result in a change in
behavior.

Depends on D24154

(Assignee)

Comment 10

a month ago

Also adds a javascript.options.ion.full.threshold browser pref and similar shell
flags.

This doesn't rename the existing prefs yet.

Depends on D24155

(Assignee)

Comment 11

a month ago

We want this to be more than 100 for the full-optimizations tier. Making this
relative also works better for tests that set a small Ion warm-up threshold.

Also disables OSR in some tests depending on the old behavior.

Depends on D24156

(Assignee)

Comment 12

a month ago

The old code would assert because we needed too many scratch registers, but it
was dead code until this patch.

Depends on D24157

(Assignee)

Comment 13

a month ago

Ion can do aggressive inlining, but inlining a lot of code has a negative
effect on compilation time and memory usage. It also means we spend more time
in the slower Baseline code while compiling the Ion code off-thread or after an
invalidation.

To address this, Ion now consists of two tiers:

  • Normal: the first tier (warm-up threshold of 1,000) only inlines small
    functions one level deep. This tier also has recompile checks to
    recompile the script when it becomes very hot.

  • Full: the second tier (warm-up threshold of 100,000) is only used for very
    hot code so we can afford inlining a lot more code.

This improves Speedometer and GDocs by more than 5%.

Depends on D24158

Attachment #9052206 - Attachment description: Bug 1382650 part 3 - Clean up eager compilation behavior. r?nbp! → Bug 1382650 part 3 - Clean up Ion eager compilation code. r?nbp!
(Assignee)

Updated

26 days ago
Keywords: leave-open

Comment 14

26 days ago
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ca8d272b176f
part 1 - Remove separate Ion warmup threshold for small functions, as it's equivalent to the normal one. r=nbp
https://hg.mozilla.org/integration/autoland/rev/30247ab61679
part 2 - Store Ion warmup threshold only in JitOptions. r=nbp
https://hg.mozilla.org/integration/autoland/rev/7ee65d07d83f
part 3 - Clean up Ion eager compilation code. r=nbp

Comment 16

25 days ago
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a812f6daf98e
part 4 - Split Ion warmup threshold JitOption in 'normal' and 'full' options. r=nbp
https://hg.mozilla.org/integration/autoland/rev/0614a9178eef
part 5 - Calculate OSR warm-up threshold difference based on the warm-up threshold instead of hard-coding 100. r=nbp
https://hg.mozilla.org/integration/autoland/rev/6960595971e5
part 6 - Fix ARM64 implementation of branch32(AbsoluteAddress, Imm32). r=nbp
(Assignee)

Updated

25 days ago
Keywords: leave-open

Comment 18

25 days ago
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/391cc6344efe
part 7 - Use a separate Ion optimization level for very hot code. r=nbp

Comment 19

24 days ago
bugherder
Status: ASSIGNED → RESOLVED
Last Resolved: 24 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
(Assignee)

Comment 20

24 days ago

Unfortunately bug 1536874 made changes to the tp6 tests right before this landed, so I don't know if perf alerts will work well.

That said, I did a Try push comparing Speedometer and tp6 with and without part 7 [0] on Linux64, OS X, Win64:

  • Speedometer: 5% faster
  • Instagram: 20% faster on Linux64, 13% on Win64. Not sure what's going on.
  • GDocs: 7-8% faster
  • GSlides: 6-7% faster
  • Facebook: 4% faster
  • YouTube: 4% faster

And some smaller improvements across the board. Improvements might be bigger on low-end devices.

[0] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=699add20be66c755397e327ec2474d851aeae1a9&newProject=try&newRevision=8e46faa6645ec0f40fa0c4b04353e8bc8ea0e89c&framework=10

(Assignee)

Comment 21

20 days ago

We might uplift these patches to beta. Having a JitOption makes it easier to
turn this off if needed.

Comment 22

19 days ago
Pushed by jdemooij@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3134740d831c
part 8 - Add a JitOption to disable use of Ion optimization levels. r=nbp
(Assignee)

Comment 24

17 days ago

Comment on attachment 9052204 [details]
Bug 1382650 part 1 - Remove separate Ion warmup threshold for small functions, as it's equivalent to the normal one. r?nbp!

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: None
  • User impact if declined: After some discussion we'd like to uplift these patches to FF67 to fix issues with Ion compilation slowing down page loads.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): There's some risk but this has been on Nightly for a week without any issues. We think the perf improvements are worth it.
  • String changes made/needed: None
Attachment #9052204 - Flags: approval-mozilla-beta?
(Assignee)

Updated

17 days ago
Attachment #9052205 - Flags: approval-mozilla-beta?
Attachment #9052206 - Flags: approval-mozilla-beta?
Attachment #9052207 - Flags: approval-mozilla-beta?
Attachment #9052208 - Flags: approval-mozilla-beta?
Attachment #9052209 - Flags: approval-mozilla-beta?
Attachment #9052210 - Flags: approval-mozilla-beta?
Attachment #9055095 - Flags: approval-mozilla-beta?

Comment on attachment 9052204 [details]
Bug 1382650 part 1 - Remove separate Ion warmup threshold for small functions, as it's equivalent to the normal one. r?nbp!

These patches have been on Nightly for 10 days with no stability or functional regressions reported. This and the significant performance improvements on multiple major sites mean that I am taking the risk of taking these patches for 67 beta 9 as we are still in the middle of the beta cycle. If the uplifts to beta degrades our stability, we can back these patches out.

Attachment #9052204 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052205 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052206 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052207 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052208 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052209 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052210 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9055095 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

== Change summary for alert #20341 (as of Mon, 08 Apr 2019 08:41:03 GMT) ==

Improvements:

2% tp5o linux64-shippable opt e10s stylo 130.87 -> 127.77

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=20341

Regressions: 1545379
(Assignee)

Updated

4 days ago
No longer regressions: 1545379
You need to log in before you can comment on or make changes to this bug.