Closed Bug 1962418 Opened 11 months ago Closed 1 month ago

Add JetStream3 to our PGO training set

Categories

(Firefox Build System :: General, task, P3)

task

Tracking

(firefox150 fixed)

RESOLVED FIXED
150 Branch
Tracking Status
firefox150 --- fixed

People

(Reporter: rhunt, Assigned: rhunt)

References

(Depends on 1 open bug, Blocks 4 open bugs)

Details

(Keywords: perf-alert)

Attachments

(1 file, 1 obsolete file)

I was looking at our PGO training set [1] and noticed that there is not any wasm content in it. Unless I'm missing it somewhere, it'd be really nice to add some.

Probably the best way to get this content would be to run the latest version of JetStream3, which is one of the better wasm benchmarks. We got some support for JetStream3 in bug 1943365, but it looks like the actual code is not in-tree, so I'm not sure how easy it is to get running with the PGO setup.

[1] https://searchfox.org/mozilla-central/rev/04a2c5317c0af560ed1689304498416c9c6c485a/build/pgo/index.html#8

See Also: → 1862309, 1867581
See Also: → 1954124, 1954202
Priority: -- → P3

We would need to add it in-tree like we did for sp3 (https://phabricator.services.mozilla.com/D194040). The benchmark perf tests should also be changed to use the in-tree version instead.

Isn't js3 still in active development though? It seems like it would be better to wait for it to stabilize, but perf engineers may be able to answer that.

Jetstream 3 is still in development, but it's already arguably the best wasm benchmark available, so it might be worth adding early. My understanding, validated on matrix, is that once we've got a vendored copy, mach vendor makes it easy for us to pull in changes from upstream.

Another benefit is that it would make it very clear which version of JS3 we're running in CI, which is good because there's been some confusion about that (see eg bug 1954124).

At the very least, it would be interesting to see how much our JS3 score on the wasm subtests is improved by adding JS3 to the PGO training set. (But it's an unrealistic best-case scenario for adding wasm to PGO training, because we'd be testing directly on the training data, so it's probably an upper bound for wasm perf improvements.)

See Also: → 1973303
Assignee: nobody → rhunt
Status: NEW → ASSIGNED
Duplicate of this bug: 1973303
Attachment #9498497 - Attachment description: WIP: Bug 1962418 - WIP: Vendor and use JS3 in pgo builds. → Bug 1962418 - Vendor JetStream3 benchmark into the tree. r?kshampur

This mostly follows the example set by Speedometer3's integration into
our PGO system. I did verify that it needs to run in a dedicated
web server, like how Speedometer does, or else it hangs.

Depends on: 1976925

From investigating bug 1976925, I've realized that the JetStream3 folder is pretty large at 167.5 MiB. I tried to see if there was anything we can exclude from it, but it looks like it's all just actually in the benchmark files themselves.

The biggest single one is the TFJS wasm benchmark files at around 90MiB. I'm assuming that's from model weights. There was some discussion of removing that benchmark, maybe it'd be good to do that before landing this.

I'm holding off on trying to get these patches landed until I can figure out if we're removing the TFJS benchmark or not.

In the meantime, here's a perf compare link with these patches. It looks like there's medium confidence this is a 2% win on JS3 on Win11/Linux.

Here's the most significant subtest changes [2]. Notable ones:

  • Dart-wasm: 20% better
  • bigint-noble-ed25519: 20% better
  • doxbee promise: 24% better
  • first-inspector-code-load: 20% better
  • WSL-Stdlib 10% worse

[1] https://perf.compare/compare-results?baseRev=2e23192a9a0ff0f048f5ae427640900cfd4851a5&baseRepo=try&newRev=859d70631edba5a7704c7c4e58846f2e17e34511&newRepo=try&framework=13
[2] https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=859d70631edba5a7704c7c4e58846f2e17e34511&originalSignature=5296731&newSignature=5296731&framework=13&application=firefox&originalRevision=2e23192a9a0ff0f048f5ae427640900cfd4851a5&page=1

Blocks: 1971612

Update on my end.

This has been blocked for me because of the size of the TFJS tests. That subtest has now been deprecated and so we don't need to vendor it. However there were still other large files, so I worked to update the JS3 harness so that we could compress resources in the repository and have them automatically decompressed at runtime. This lets us shrink the size of a bunch of files upstream.

However, the size of the repo has gotten worse since I started this and I'm not sure if compression is going to help enough.

Some examples:

64.25 MiB       ./transformersjs/build/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/onnx/model_uint8.onnx
29.30 MiB       ./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/decoder_model_merged_quantized.onnx
31.41 MiB       ./babylonjs/dist/bundle.es5.min.js.map
29.72 MiB       ./babylonjs/dist/bundle.es6.min.js.map
13.65 MiB       ./TypeScript/dist/bundle.js.map
11.09 MiB       ./babylonjs/dist/bundle.es5.min.js
10.62 MiB       ./transformersjs/build/onnxruntime-web/ort-wasm-simd-threaded.wasm
9.66 MiB        ./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/encoder_model_quantized.onnx
...

These can all be compressed now, but I'm not sure if that will be enough. I need to measure the final size once compression is applied.

The other path we have would be to not vendor subtests that have large files. We could skip vendoring the transformersjs ones, for example. We wouldn't get PGO opts for those tests, but that's probably fine. The bigger issue is that we wouldn't get CI results for those subtests.

But maybe we could continue having CI clone the upstream repo, and only use the slim vendored version for PGO? That would definitely simplify things here.

Could we have a fetched toolchain for this otherwise?

(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)

Could we have a fetched toolchain for this otherwise?

What would that mean? All of this is a bit outside my normal purview.

Adding a job to our CI which fetches and archives the upstream repo as an artifact that can then be fetched and used by other jobs later. The people in the Firefox CI Matrix channel could probably help you explore that option more if that's something you wanted to investigate.

Blocks: 1860066

Ah, that makes sense.

So I think there are three options here:

  1. Vendor in all of JS3 and use that for performance testing and for PGO builds
  2. Vendor in a smaller subset that's needed for good PGO results, have performance testing continue to fetch JS3 from the upstream GH repo
  3. Add a fetched toolchain (as RyanVM suggested) for JS3 that we use in performance tests and PGO builds

(1) Looks like it won't work due to the JS3 repo getting very large with ML models and other things that we shouldn't vendor.
(2) Seems viable to me and would be the easiest. But it means our performance testing and PGO builds need to be kept manually in sync. I'm not sure if that's an issue though?
(3) This would get us all of JS3 (not a subset) for PGO builds, but would also have the most moving pieces. I think the PGO server would need to have optional support for JS3 if the user hasn't downloaded it (not sure if that's an issue), and I'm not sure how perf testing/raptor would work with this.

I have a preference for (2) as it seems easier, but I'm not sure if the potential separate version for PGO builds vs performance testing is an issue. I don't think it'd be too hard to update both whenever we're updating our JS3 version. Kash, do you have thoughts here?

Flags: needinfo?(kshampur)

These can all be compressed now, but I'm not sure if that will be enough. I need to measure the final size once compression is applied.

curious on if you ended up checking this? (1) is of course ideal for both perftesting and pgo using the same source of truth

for (3) we could in theory modify js3 perftest to use the same hypothetical toolchain, and then pgo and perftesting would not risk diverging if the revision changes. But yes I agree this has the most moving pieces and might end up being a bit hacky on the perftesting side to set this up for just one benchmark.

regarding (1) and size I see back in comment 6 you said it was ~160M. For reference it seems like Speedometer 3 is shy of ~250M
But as of present I see that jetstream is ~685M... do we see it continuing to grow? I don't recall any discussion happening back when sp3 was vendored in regarding concerns about it's size (or maybe those discussions happened somewhere already). So i am just curious, is it that big of a deal to vendor in the entire thing?

it does sound like (2) might be the best approach (and probably fastest to do, after option 1). And manually syncing the revisions shouldn't be too big of an issue as long as whoever is updating it remembers to update the files. But how much smaller would this subset be (if you've checked already)?

Flags: needinfo?(kshampur)

I did a quick experiment here and it looks like I can get the vendored copy down to 125MiB by compressing most of the benchmark files and excluding benchmarks that are disabled by default (the distilbert benchmark has a very large model file, but is now disabled). I have not tested it completely, so there's a chance I messed something up. This is probably the best-case scenario for how much we can shrink the vendored copy.

do we see it continuing to grow?

There is currently a workload freeze for version 3, and so no new benchmarks should be added to it. Future versions could definitely add new benchmarks or enable some that I was able to exclude.

I don't know the threshold for when something is too big here. 125MiB still seems like a lot to me, but probably not that much in the grand scheme of things if we have Speedometer at 250MiB.

It looks like there is some precedent for using fetched toolchains for talos and PDF.js benchmarks in bug 1855674.

See Also: → 1855674

125M nice!

in talos PDF case, don't know if it would have made sense to vendor it in, whereas it sort of makes sense for js3 due to PGO. but yea, that seems like a good example for if you go the fetch route. And an addendum to my previous remark regarding option (3) of adding a fetch... similar to option 2, we could just make sure to manually update the perftest and fetch revisions whenever an update happens (and avoid doing hacky stuff on the perftest side like i mentioned)

Something else i noticed, https://searchfox.org/firefox-main/rev/3728e0c87fc4fa87ddf0c7a8183f2dd2329be96a/build/pgo/index.html#13-40 I am not sure what js-input is but sunspider is already part of jetstream3, so could this sunspider be removed? (and maybe the entire list is worth revisiting but that's another matter entirely)
But if js3 vendored size ends up being an issue (and i am not sure who makes that call) perhaps cleaning up some of the existing PGO list could compensate there

Attachment #9498497 - Attachment is obsolete: true

New blocker unlocked. It appears that by adding JS3 (which contains wasm) to our PGO corpus, we're now getting crashes in the final build when running wasm code [1]. The before try run doesn't exhibit these issues.

The wm/wm-b/godot/godot-b/js3 tests are all crashing now on Win64/Linux64 (but not Android/ARM64, so maybe x64 specific issue). The crashes are all the same crashing stack/signature in our wasm baseline compiler [2]. Something in the js::wasm::BaseCompiler::emitRemainderI32() and its inlined callees (10 levels) is now going wrong. I've scanned through the code and don't see anything obvious wrong or UB.

I would try to reproduce this in pernosco, but the self-service page doesn't seem to be working for me right now.

[1] https://treeherder.mozilla.org/jobs?repo=try&revision=4b5e74ae604793703a4169aebbb2661f1ffd215d&selectedTaskRun=QNiKfyzcTNu3Yg7GVdC3_g.0
[2] https://tests.firefox.dev/crash-viewer.html?url=https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FDcv1ipfBRryaoYBCpKjYKQ%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2F386a703a-df7c-b465-6d75-a87c3f094b7c.json

I confirm that this looks a lot like a compiler bug to me. More precisely the inlined code for js::jit::X86Encoding::BaseAssembler::subl_rr(js::jit::X86Encoding::RegisterID, js::jit::X86Encoding::RegisterID) seems broken after potentially an invalid reordering of instructions. I'll file a dedicated bug with more details blocking this one. I have suggested adding a MOZ_NEVER_INLINE on this function to :rhunt to see if that can unblock the situation while we study the compiler bug.

(In reply to Ryan Hunt [:rhunt] from comment #19)

It looks like adding MOZ_NEVER_INLINE to subl_rr fixed the issue for wasm-misc and godot [1]. JS3 is still crashing, but in a new but similar way [2]. Now it's in subq_rr.

Interesting. This looks like a direct transposition of the bug we were hitting in js::wasm::BaseCompiler::emitRemainderI32() to js::wasm::BaseCompiler::emitRemainderI64(). Which kind of makes sense because the code for these two are very similar, but it is useful information. Thanks!

I still need to confirm this with Yannis, but my local experiment points at this being a compiler bug being fixed upstream. Keep in touch.

With the landing of the backport patches in bug 2007081, the problem described by comment 17 should be fixed. Congrats :sergesanspaille!

Okay, with the backport patches we no longer have the miscompile. I've re-ran CI to get the latest perf results with this patch [1]

I'm seeing a 3.45% improvement to JetStream3 with PGO running. You can see which subtests benefit here [2]. I don't see any difference in SP3, but that's not surprising. wasm-godot seems to also improve by 9%.

[1] https://perf.compare/compare-results?baseRev=29a268fe5f3093cdb5a308b73fe33ff56893aec0&baseRepo=try&newRev=6865e7f573b63b49f417a41df4f1e3a085078669&newRepo=try&framework=13
[2] https://perf.compare/subtests-compare-results?baseRev=29a268fe5f3093cdb5a308b73fe33ff56893aec0&baseRepo=try&newRev=6865e7f573b63b49f417a41df4f1e3a085078669&newRepo=try&framework=13&baseParentSignature=5296731&newParentSignature=5296731&test_version=student-t

Pushed by rhunt@eqrion.net: https://github.com/mozilla-firefox/firefox/commit/76de161425f0 https://hg.mozilla.org/integration/autoland/rev/6ef23704f980 Add JetStream3 to our PGO data set. r=firefox-build-system-reviewers,glandium,perftest-reviewers,releng-reviewers,bhearsum,kshampur,ahal
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 150 Branch
Regressions: 2019465
Regressions: 2020342
Regressions: 2020943

(In reply to Pulsebot from comment #24)

Pushed by rhunt@eqrion.net:
https://github.com/mozilla-firefox/firefox/commit/76de161425f0
https://hg.mozilla.org/integration/autoland/rev/6ef23704f980
Add JetStream3 to our PGO data set.
r=firefox-build-system-reviewers,glandium,perftest-reviewers,releng-
reviewers,bhearsum,kshampur,ahal

Perfherder has detected a devtools performance change from push 6ef23704f980e99e85b2df6f7a6f166c2e3d0d00.

No action is required from the author; this comment is provided for informational purposes only.

Improvement Test Platform Options Absolute values [old vs new]
4% damp console.log-in-loop-content-process-set (doc) linux1804-64-shippable-qr e10s fission stylo webrender 707.42 ms -> 675.78 ms
4% damp console.log-in-loop-content-process-map (doc) linux1804-64-shippable-qr e10s fission stylo webrender 865.88 ms -> 830.33 ms
3% damp console.log-in-loop-content-process-infinity (doc) linux1804-64-shippable-qr e10s fission stylo webrender 72.77 ms -> 70.29 ms
3% damp console.log-in-loop-content-process-bigint (doc) linux1804-64-shippable-qr e10s fission stylo webrender 82.53 ms -> 79.79 ms
3% damp console.log-in-loop-content-process-null (doc) linux1804-64-shippable-qr e10s fission stylo webrender 74.85 ms -> 72.48 ms
3% damp console.log-in-loop-content-process-bool (doc) linux1804-64-shippable-qr e10s fission stylo webrender 72.71 ms -> 70.78 ms

Need Help or Information?

If you have any questions, please reach out to afinder@mozilla.com. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

Keywords: perf-alert

(In reply to Pulsebot from comment #24)

Pushed by rhunt@eqrion.net:
https://github.com/mozilla-firefox/firefox/commit/76de161425f0
https://hg.mozilla.org/integration/autoland/rev/6ef23704f980
Add JetStream3 to our PGO data set.
r=firefox-build-system-reviewers,glandium,perftest-reviewers,releng-
reviewers,bhearsum,kshampur,ahal

Perfherder has detected a talos performance change from push 6ef23704f980e99e85b2df6f7a6f166c2e3d0d00.

No action is required from the author; this comment is provided for informational purposes only.

Improvement Test Platform Options Absolute values [old vs new]
11% pdfpaint issue14982.pdf (doc) linux1804-64-shippable-qr e10s fission stylo webrender-sw 780.09 ms -> 697.15 ms
7% pdfpaint issue6737.pdf (doc) linux1804-64-shippable-qr e10s fission stylo webrender-sw 1,836.29 ms -> 1,715.03 ms
5% tsvgx (doc) linux1804-64-shippable-qr e10s fission stylo webrender 214.27 -> 203.45

Need Help or Information?

If you have any questions, please reach out to bacasandrei@mozilla.com. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: