Open Bug 2009686 Opened 2 days ago Updated 20 hours ago

JetStream 3 async-fs/sync-fs regression after benchmark update

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

Tracking Status
firefox-esr140 --- unaffected
firefox147 --- unaffected
firefox148 --- affected
firefox149 --- affected

People

(Reporter: jandem, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf, perf-alert)

Bug 2006003 updated JetStream 3 in CI to the latest version. Our score for the async-fs/sync-fs tests regressed by about 85%. Chrome's score also regressed a bit but much less (45-50%).

The commit log has some changes that mention these tests:

https://github.com/WebKit/JetStream/commit/6215add877abe650c4f62fa4c99ef39443ab7aed
https://github.com/WebKit/JetStream/commit/12519fb2fc252dd60755bf4090aedaca0ebc136f

Perfherder has detected a browsertime performance change from push c8d1169474d8353ce311c53dd9c0640c8bae90cd.

Please acknowledge, and begin investigating this alert within 3 business days, or the patch(es) may be backed out in accordance with our regression policy. Our guide to handling regression bugs has information about how you can proceed with this investigation.

If you have any questions or need any help with the investigation, please reach out to fbilt@mozilla.com. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
986% jetstream3 sync-fs-Average linux1804-64-shippable-qr fission webrender 9.61 -> 104.35 Before/After
900% jetstream3 sync-fs-Worst linux1804-64-shippable-qr fission webrender 12.65 -> 126.43 Before/After
776% jetstream3 sync-fs-Worst macosx1500-aarch64-shippable fission webrender 3.42 -> 29.99
732% jetstream3 sync-fs-Average macosx1500-aarch64-shippable fission webrender 2.35 -> 19.58
655% jetstream3 sync-fs-Average android-hw-a55-14-0-aarch64-shippable fission webrender 9.23 -> 69.65
641% jetstream3 sync-fs-Average windows11-64-24h2-shippable fission webrender 3.73 -> 27.66 Before/After
640% jetstream3 sync-fs-Average macosx1470-64-shippable fission webrender 7.50 -> 55.52
637% jetstream3 sync-fs-Average android-hw-a55-14-0-aarch64-shippable webrender 9.41 -> 69.39
589% jetstream3 sync-fs-First linux1804-64-shippable-qr fission webrender 16.33 -> 112.55 Before/After
588% jetstream3 sync-fs-Worst android-hw-a55-14-0-aarch64-shippable fission webrender 15.12 -> 103.93
... ... ... ... ... ...
4% jetstream3 score linux1804-64-shippable-qr fission webrender 65.11 -> 62.22 Before/After
3% jetstream3 android-hw-a55-14-0-aarch64-shippable fission webrender 69.34 -> 67.18
3% jetstream3 score android-hw-a55-14-0-aarch64-shippable fission webrender 69.34 -> 67.18
3% jetstream3 android-hw-a55-14-0-aarch64-shippable webrender 69.17 -> 67.15
3% jetstream3 score android-hw-a55-14-0-aarch64-shippable webrender 69.17 -> 67.15

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
84% jetstream3 transformersjs-bert-wasm-First android-hw-a55-14-0-aarch64-shippable webrender 4,592.63 -> 740.93
83% jetstream3 transformersjs-bert-wasm-First android-hw-a55-14-0-aarch64-shippable fission webrender 4,224.50 -> 738.39
81% jetstream3 transformersjs-bert-wasm-Geometric android-hw-a55-14-0-aarch64-shippable fission webrender 12.86 -> 23.23
79% jetstream3 transformersjs-bert-wasm-Geometric android-hw-a55-14-0-aarch64-shippable webrender 12.95 -> 23.23
54% jetstream3 transformersjs-bert-wasm-First windows11-64-24h2-shippable fission webrender 848.99 -> 390.41 Before/After
... ... ... ... ... ...
5% jetstream3 doxbee-async-Average macosx1500-aarch64-shippable fission webrender 34.63 -> 32.88

Need Help or Information?

If you have any questions, please reach out to fbilt@mozilla.com. Alternatively, you can find help on Slack by joining #perf-help, and on Matrix you can find help by joining #perftest.

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

Type: task → defect
Regressions: 2006003
Regressed by: 2006003
No longer regressions: 2006003

Set release status flags based on info from the regressing bug 2006003

Since the regression here is caused by newest version of the benchmark, is this really a regression. We do have new optimization work now, but there is no fix to the benchmark nor would we back-out the new version of the benchmark (i.e from bug 20060043).

Should we remove the regression keyword?

Severity: -- → S3
Priority: -- → P3

FWIW, I did take a quick look at these changes at the time and gave them a thumbs up. My understanding is that these benchmarks are intended to measure generator performance, but they were not doing a very good job of doing so. Now they're better at measuring generator performance, which makes us look worse.

Keywords: regression
No longer regressed by: 2006003
You need to log in before you can comment on or make changes to this bug.