1962418 - Add JetStream3 to our PGO training set

Assignee

Description

•

11 months ago

I was looking at our PGO training set [1] and noticed that there is not any wasm content in it. Unless I'm missing it somewhere, it'd be really nice to add some.

Probably the best way to get this content would be to run the latest version of JetStream3, which is one of the better wasm benchmarks. We got some support for JetStream3 in bug 1943365, but it looks like the actual code is not in-tree, so I'm not sure how easy it is to get running with the PGO setup.

[1] https://searchfox.org/mozilla-central/rev/04a2c5317c0af560ed1689304498416c9c6c485a/build/pgo/index.html#8

Ryan Hunt [:rhunt]

Assignee

Updated

•

11 months ago

Updated

•

11 months ago

Updated

•

11 months ago

Priority: -- → P3

Greg Mierzwinski [:sparky]

Comment 1

•

11 months ago

We would need to add it in-tree like we did for sp3 (https://phabricator.services.mozilla.com/D194040). The benchmark perf tests should also be changed to use the in-tree version instead.

Isn't js3 still in active development though? It seems like it would be better to wait for it to stabilize, but perf engineers may be able to answer that.

Iain Ireland [:iain]

Comment 2

•

11 months ago

Jetstream 3 is still in development, but it's already arguably the best wasm benchmark available, so it might be worth adding early. My understanding, validated on matrix, is that once we've got a vendored copy, mach vendor makes it easy for us to pull in changes from upstream.

Another benefit is that it would make it very clear which version of JS3 we're running in CI, which is good because there's been some confusion about that (see eg bug 1954124).

At the very least, it would be interesting to see how much our JS3 score on the wasm subtests is improved by adding JS3 to the PGO training set. (But it's an unrealistic best-case scenario for adding wasm to PGO training, because we'd be testing directly on the training data, so it's probably an upper bound for wasm perf improvements.)

Ryan Hunt [:rhunt]

Assignee

Updated

•

10 months ago

Blocks: wasm-jetstream3

Kash Shampur [:kshampur] ⌚EST

Updated

•

9 months ago

Comment 3

•

9 months ago

Attached file Bug 1962418 - Vendor JetStream3 benchmark into the tree. r?kshampur (obsolete) — Details

Phabricator Automation

Updated

•

9 months ago

Assignee: nobody → rhunt

Status: NEW → ASSIGNED

Kash Shampur [:kshampur] ⌚EST

Updated

•

9 months ago

Duplicate of this bug: 1973303

Phabricator Automation

Updated

•

9 months ago

Attachment #9498497 - Attachment description: WIP: Bug 1962418 - WIP: Vendor and use JS3 in pgo builds. → Bug 1962418 - Vendor JetStream3 benchmark into the tree. r?kshampur

Ryan Hunt [:rhunt]

Assignee

Comment 5

•

9 months ago

Attached file Bug 1962418 - Add JetStream3 to our PGO data set. r?#firefox-build-system-reviewers — Details

This mostly follows the example set by Speedometer3's integration into
our PGO system. I did verify that it needs to run in a dedicated
web server, like how Speedometer does, or else it hangs.

Ryan Hunt [:rhunt]

Assignee

Updated

•

9 months ago

Depends on: 1976925

Ryan Hunt [:rhunt]

Assignee

Comment 6

•

9 months ago

From investigating bug 1976925, I've realized that the JetStream3 folder is pretty large at 167.5 MiB. I tried to see if there was anything we can exclude from it, but it looks like it's all just actually in the benchmark files themselves.

The biggest single one is the TFJS wasm benchmark files at around 90MiB. I'm assuming that's from model weights. There was some discussion of removing that benchmark, maybe it'd be good to do that before landing this.

Ryan Hunt [:rhunt]

Assignee

Comment 7

•

8 months ago

I'm holding off on trying to get these patches landed until I can figure out if we're removing the TFJS benchmark or not.

In the meantime, here's a perf compare link with these patches. It looks like there's medium confidence this is a 2% win on JS3 on Win11/Linux.

Here's the most significant subtest changes [2]. Notable ones:

Dart-wasm: 20% better
bigint-noble-ed25519: 20% better
doxbee promise: 24% better
first-inspector-code-load: 20% better
WSL-Stdlib 10% worse

[1] https://perf.compare/compare-results?baseRev=2e23192a9a0ff0f048f5ae427640900cfd4851a5&baseRepo=try&newRev=859d70631edba5a7704c7c4e58846f2e17e34511&newRepo=try&framework=13
[2] https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=859d70631edba5a7704c7c4e58846f2e17e34511&originalSignature=5296731&newSignature=5296731&framework=13&application=firefox&originalRevision=2e23192a9a0ff0f048f5ae427640900cfd4851a5&page=1

Ryan Hunt [:rhunt]

Assignee

Updated

•

8 months ago

Blocks: 1971612

Ryan Hunt [:rhunt]

Assignee

Comment 8

•

4 months ago

Update on my end.

This has been blocked for me because of the size of the TFJS tests. That subtest has now been deprecated and so we don't need to vendor it. However there were still other large files, so I worked to update the JS3 harness so that we could compress resources in the repository and have them automatically decompressed at runtime. This lets us shrink the size of a bunch of files upstream.

However, the size of the repo has gotten worse since I started this and I'm not sure if compression is going to help enough.

Some examples:

64.25 MiB       ./transformersjs/build/models/Xenova/distilbert-base-uncased-finetuned-sst-2-english/onnx/model_uint8.onnx
29.30 MiB       ./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/decoder_model_merged_quantized.onnx
31.41 MiB       ./babylonjs/dist/bundle.es5.min.js.map
29.72 MiB       ./babylonjs/dist/bundle.es6.min.js.map
13.65 MiB       ./TypeScript/dist/bundle.js.map
11.09 MiB       ./babylonjs/dist/bundle.es5.min.js
10.62 MiB       ./transformersjs/build/onnxruntime-web/ort-wasm-simd-threaded.wasm
9.66 MiB        ./transformersjs/build/models/Xenova/whisper-tiny.en/onnx/encoder_model_quantized.onnx
...

These can all be compressed now, but I'm not sure if that will be enough. I need to measure the final size once compression is applied.

The other path we have would be to not vendor subtests that have large files. We could skip vendoring the transformersjs ones, for example. We wouldn't get PGO opts for those tests, but that's probably fine. The bigger issue is that we wouldn't get CI results for those subtests.

But maybe we could continue having CI clone the upstream repo, and only use the slim vendored version for PGO? That would definitely simplify things here.

Ryan VanderMeulen [:RyanVM]

Comment 9

•

4 months ago

Could we have a fetched toolchain for this otherwise?

Ryan Hunt [:rhunt]

Assignee

Comment 10

•

4 months ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)

Could we have a fetched toolchain for this otherwise?

What would that mean? All of this is a bit outside my normal purview.

Ryan VanderMeulen [:RyanVM]

Comment 11

•

4 months ago

Adding a job to our CI which fetches and archives the upstream repo as an artifact that can then be fetched and used by other jobs later. The people in the Firefox CI Matrix channel could probably help you explore that option more if that's something you wanted to investigate.

Ryan Hunt [:rhunt]

Assignee

Updated

•

4 months ago

Blocks: 1860066

Ryan Hunt [:rhunt]

Assignee

Comment 12

•

4 months ago

Ah, that makes sense.

So I think there are three options here:

Vendor in all of JS3 and use that for performance testing and for PGO builds
Vendor in a smaller subset that's needed for good PGO results, have performance testing continue to fetch JS3 from the upstream GH repo
Add a fetched toolchain (as RyanVM suggested) for JS3 that we use in performance tests and PGO builds

(1) Looks like it won't work due to the JS3 repo getting very large with ML models and other things that we shouldn't vendor.
(2) Seems viable to me and would be the easiest. But it means our performance testing and PGO builds need to be kept manually in sync. I'm not sure if that's an issue though?
(3) This would get us all of JS3 (not a subset) for PGO builds, but would also have the most moving pieces. I think the PGO server would need to have optional support for JS3 if the user hasn't downloaded it (not sure if that's an issue), and I'm not sure how perf testing/raptor would work with this.

I have a preference for (2) as it seems easier, but I'm not sure if the potential separate version for PGO builds vs performance testing is an issue. I don't think it'd be too hard to update both whenever we're updating our JS3 version. Kash, do you have thoughts here?

Flags: needinfo?(kshampur)

Kash Shampur [:kshampur] ⌚EST

Comment 13

•

4 months ago

•

Edited

These can all be compressed now, but I'm not sure if that will be enough. I need to measure the final size once compression is applied.

curious on if you ended up checking this? (1) is of course ideal for both perftesting and pgo using the same source of truth

for (3) we could in theory modify js3 perftest to use the same hypothetical toolchain, and then pgo and perftesting would not risk diverging if the revision changes. But yes I agree this has the most moving pieces and might end up being a bit hacky on the perftesting side to set this up for just one benchmark.

regarding (1) and size I see back in comment 6 you said it was ~160M. For reference it seems like Speedometer 3 is shy of ~250M
But as of present I see that jetstream is ~685M... do we see it continuing to grow? I don't recall any discussion happening back when sp3 was vendored in regarding concerns about it's size (or maybe those discussions happened somewhere already). So i am just curious, is it that big of a deal to vendor in the entire thing?

it does sound like (2) might be the best approach (and probably fastest to do, after option 1). And manually syncing the revisions shouldn't be too big of an issue as long as whoever is updating it remembers to update the files. But how much smaller would this subset be (if you've checked already)?

Flags: needinfo?(kshampur)

Ryan Hunt [:rhunt]

Assignee

Comment 14

•

4 months ago

I did a quick experiment here and it looks like I can get the vendored copy down to 125MiB by compressing most of the benchmark files and excluding benchmarks that are disabled by default (the distilbert benchmark has a very large model file, but is now disabled). I have not tested it completely, so there's a chance I messed something up. This is probably the best-case scenario for how much we can shrink the vendored copy.

do we see it continuing to grow?

There is currently a workload freeze for version 3, and so no new benchmarks should be added to it. Future versions could definitely add new benchmarks or enable some that I was able to exclude.

I don't know the threshold for when something is too big here. 125MiB still seems like a lot to me, but probably not that much in the grand scheme of things if we have Speedometer at 250MiB.

Ryan Hunt [:rhunt]

Assignee

Comment 15

•

4 months ago

It looks like there is some precedent for using fetched toolchains for talos and PDF.js benchmarks in bug 1855674.

Comment 16

•

4 months ago

•

Edited

125M nice!

in talos PDF case, don't know if it would have made sense to vendor it in, whereas it sort of makes sense for js3 due to PGO. but yea, that seems like a good example for if you go the fetch route. And an addendum to my previous remark regarding option (3) of adding a fetch... similar to option 2, we could just make sure to manually update the perftest and fetch revisions whenever an update happens (and avoid doing hacky stuff on the perftest side like i mentioned)

Something else i noticed, https://searchfox.org/firefox-main/rev/3728e0c87fc4fa87ddf0c7a8183f2dd2329be96a/build/pgo/index.html#13-40 I am not sure what js-input is but sunspider is already part of jetstream3, so could this sunspider be removed? (and maybe the entire list is worth revisiting but that's another matter entirely)
But if js3 vendored size ends up being an issue (and i am not sure who makes that call) perhaps cleaning up some of the existing PGO list could compensate there

Phabricator Automation

Updated

•

4 months ago

Attachment #9498497 - Attachment is obsolete: true

Ryan Hunt [:rhunt]

Assignee

Comment 17

•

3 months ago

New blocker unlocked. It appears that by adding JS3 (which contains wasm) to our PGO corpus, we're now getting crashes in the final build when running wasm code [1]. The before try run doesn't exhibit these issues.

The wm/wm-b/godot/godot-b/js3 tests are all crashing now on Win64/Linux64 (but not Android/ARM64, so maybe x64 specific issue). The crashes are all the same crashing stack/signature in our wasm baseline compiler [2]. Something in the js::wasm::BaseCompiler::emitRemainderI32() and its inlined callees (10 levels) is now going wrong. I've scanned through the code and don't see anything obvious wrong or UB.

I would try to reproduce this in pernosco, but the self-service page doesn't seem to be working for me right now.

[1] https://treeherder.mozilla.org/jobs?repo=try&revision=4b5e74ae604793703a4169aebbb2661f1ffd215d&selectedTaskRun=QNiKfyzcTNu3Yg7GVdC3_g.0
[2] https://tests.firefox.dev/crash-viewer.html?url=https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FDcv1ipfBRryaoYBCpKjYKQ%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2F386a703a-df7c-b465-6d75-a87c3f094b7c.json

Yannis Juglaret [:yannis] (Away until April 13)

Comment 18

•

3 months ago

I confirm that this looks a lot like a compiler bug to me. More precisely the inlined code for js::jit::X86Encoding::BaseAssembler::subl_rr(js::jit::X86Encoding::RegisterID, js::jit::X86Encoding::RegisterID) seems broken after potentially an invalid reordering of instructions. I'll file a dedicated bug with more details blocking this one. I have suggested adding a MOZ_NEVER_INLINE on this function to :rhunt to see if that can unblock the situation while we study the compiler bug.

Ryan Hunt [:rhunt]

Assignee

Comment 19

•

3 months ago

It looks like adding MOZ_NEVER_INLINE to subl_rr fixed the issue for wasm-misc and godot [1]. JS3 is still crashing, but in a new but similar way [2]. Now it's in subq_rr.

[1] https://treeherder.mozilla.org/jobs?repo=try&revision=78895f20b6ae4a51109baba484b6abb8d1568327&selectedTaskRun=eb9hC10LTh68gYvjFfUbYw.0
[2] https://tests.firefox.dev/crash-viewer.html?url=https%3A%2F%2Ffirefox-ci-tc.services.mozilla.com%2Fapi%2Fqueue%2Fv1%2Ftask%2FZu_5_r8XRQmyQTgQtsoopw%2Fruns%2F0%2Fartifacts%2Fpublic%2Ftest_info%2F4cdc49e1-7fdc-8c91-a5db-e16648717067.json

Yannis Juglaret [:yannis] (Away until April 13)

Updated

•

3 months ago

Depends on: 2007081

Yannis Juglaret [:yannis] (Away until April 13)

Comment 20

•

3 months ago

•

Edited

(In reply to Ryan Hunt [:rhunt] from comment #19)

It looks like adding MOZ_NEVER_INLINE to subl_rr fixed the issue for wasm-misc and godot [1]. JS3 is still crashing, but in a new but similar way [2]. Now it's in subq_rr.

Interesting. This looks like a direct transposition of the bug we were hitting in js::wasm::BaseCompiler::emitRemainderI32() to js::wasm::BaseCompiler::emitRemainderI64(). Which kind of makes sense because the code for these two are very similar, but it is useful information. Thanks!

[:sergesanspaille]

Comment 21

•

3 months ago

I still need to confirm this with Yannis, but my local experiment points at this being a compiler bug being fixed upstream. Keep in touch.

Yannis Juglaret [:yannis] (Away until April 13)

Comment 22

•

3 months ago

•

Edited

With the landing of the backport patches in bug 2007081, the problem described by comment 17 should be fixed. Congrats :sergesanspaille!

Ryan Hunt [:rhunt]

Assignee

Comment 23

•

3 months ago

Okay, with the backport patches we no longer have the miscompile. I've re-ran CI to get the latest perf results with this patch [1]

I'm seeing a 3.45% improvement to JetStream3 with PGO running. You can see which subtests benefit here [2]. I don't see any difference in SP3, but that's not surprising. wasm-godot seems to also improve by 9%.

[1] https://perf.compare/compare-results?baseRev=29a268fe5f3093cdb5a308b73fe33ff56893aec0&baseRepo=try&newRev=6865e7f573b63b49f417a41df4f1e3a085078669&newRepo=try&framework=13
[2] https://perf.compare/subtests-compare-results?baseRev=29a268fe5f3093cdb5a308b73fe33ff56893aec0&baseRepo=try&newRev=6865e7f573b63b49f417a41df4f1e3a085078669&newRepo=try&framework=13&baseParentSignature=5296731&newParentSignature=5296731&test_version=student-t

Pulsebot

Comment 24

•

1 month ago

Pushed by rhunt@eqrion.net: https://github.com/mozilla-firefox/firefox/commit/76de161425f0 https://hg.mozilla.org/integration/autoland/rev/6ef23704f980 Add JetStream3 to our PGO data set. r=firefox-build-system-reviewers,glandium,perftest-reviewers,releng-reviewers,bhearsum,kshampur,ahal

agoloman

Comment 25

•

1 month ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6ef23704f980

Status: ASSIGNED → RESOLVED

Closed: 1 month ago

status-firefox150: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 150 Branch

Iulian Moraru

Comment 26

•

1 month ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/6ef23704f980

Cristian Tuns

Updated

•

1 month ago

Regressions: 2019465

Mayank Bansal

Comment 27

•

1 month ago

•

Edited

Looks to be a 5.5% improvement overall on Linux
With an unusual 47% improvement on cdjs

Treeherder Bug Filer

Updated

•

1 month ago

Regressions: 2020342

Treeherder Bug Filer

Updated

•

1 month ago

Regressions: 2020943

Alex Finder

Comment 28

•

1 month ago

(In reply to Pulsebot from comment #24)

Pushed by rhunt@eqrion.net:
https://github.com/mozilla-firefox/firefox/commit/76de161425f0
https://hg.mozilla.org/integration/autoland/rev/6ef23704f980
Add JetStream3 to our PGO data set.
r=firefox-build-system-reviewers,glandium,perftest-reviewers,releng-
reviewers,bhearsum,kshampur,ahal

Perfherder has detected a devtools performance change from push 6ef23704f980e99e85b2df6f7a6f166c2e3d0d00.

No action is required from the author; this comment is provided for informational purposes only.

Improvement	Test	Platform	Options	Absolute values [old vs new]
4%	damp console.log-in-loop-content-process-set (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender	707.42 ms -> 675.78 ms
4%	damp console.log-in-loop-content-process-map (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender	865.88 ms -> 830.33 ms
3%	damp console.log-in-loop-content-process-infinity (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender	72.77 ms -> 70.29 ms
3%	damp console.log-in-loop-content-process-bigint (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender	82.53 ms -> 79.79 ms
3%	damp console.log-in-loop-content-process-null (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender	74.85 ms -> 72.48 ms
3%	damp console.log-in-loop-content-process-bool (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender	72.71 ms -> 70.78 ms

Need Help or Information?

If you have any questions, please reach out to afinder@mozilla.com. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

Alex Finder

Updated

•

1 month ago

Keywords: perf-alert

Acasandrei Beatrice (needinfo me)

Comment 29

•

28 days ago

(In reply to Pulsebot from comment #24)

Pushed by rhunt@eqrion.net:
https://github.com/mozilla-firefox/firefox/commit/76de161425f0
https://hg.mozilla.org/integration/autoland/rev/6ef23704f980
Add JetStream3 to our PGO data set.
r=firefox-build-system-reviewers,glandium,perftest-reviewers,releng-
reviewers,bhearsum,kshampur,ahal

Perfherder has detected a talos performance change from push 6ef23704f980e99e85b2df6f7a6f166c2e3d0d00.

No action is required from the author; this comment is provided for informational purposes only.

Improvement	Test	Platform	Options	Absolute values [old vs new]
11%	pdfpaint issue14982.pdf (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender-sw	780.09 ms -> 697.15 ms
7%	pdfpaint issue6737.pdf (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender-sw	1,836.29 ms -> 1,715.03 ms
5%	tsvgx (doc)	linux1804-64-shippable-qr	e10s fission stylo webrender	214.27 -> 203.45

Need Help or Information?

If you have any questions, please reach out to bacasandrei@mozilla.com. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

Bug 1962418 - Vendor JetStream3 benchmark into the tree. r?kshampur 9 months ago Ryan Hunt [:rhunt] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1962418 - Add JetStream3 to our PGO data set. r?#firefox-build-system-reviewers 9 months ago Ryan Hunt [:rhunt] 48 bytes, text/x-phabricator-request		Details \| Review