Open Bug 1956880 Opened 1 year ago Updated 10 months ago

183.56% offscreencanvas_webcodecs_worker_2d_av1 offscreencanvas_webcodecs_worker_2d_av1 Mean time across 100 frames: (Windows) regression on Fri March 21 2025

Categories

(Core :: Audio/Video: Playback, defect, P2)

defect

Tracking

()

Tracking Status
firefox-esr128 --- unaffected
firefox136 --- unaffected
firefox137 --- unaffected
firefox138 --- disabled
firefox139 --- disabled

People

(Reporter: intermittent-bug-filer, Assigned: alwu, NeedInfo)

References

(Blocks 2 open bugs, Regression)

Details

(Keywords: regression)

Perfherder has detected a talos performance regression from push accd220a80e2f31c8cfb9b9f0e7989788d543ac8. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
184% offscreencanvas_webcodecs_worker_2d_av1 offscreencanvas_webcodecs_worker_2d_av1 Mean time across 100 frames: windows11-64-shippable-qr e10s fission stylo webrender 7.95 -> 22.56

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 44488

The following documentation link provides more information about this command.

For more information on performance sheriffing please see our FAQ.

If you have any questions, please do not hesitate to reach out to bacasandrei@mozilla.com.

Flags: needinfo?(alwu)

Updated summary:

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
184% offscreencanvas_webcodecs_worker_2d_av1 offscreencanvas_webcodecs_worker_2d_av1 Mean time across 100 frames: windows11-64-shippable-qr e10s fission stylo webrender 7.95 -> 22.56
178% offscreencanvas_webcodecs_main_2d_av1 offscreencanvas_webcodecs_main_2d_av1 Mean time across 100 frames: windows11-64-shippable-qr e10s fission stylo webrender 8.01 -> 22.28

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
39% offscreencanvas_webcodecs_worker_webgl_av1 offscreencanvas_webcodecs_worker_webgl_av1 Mean time across 100 frames: windows11-64-shippable-qr e10s fission stylo webrender 2.06 -> 1.25
39% offscreencanvas_webcodecs_main_webgl_av1 offscreencanvas_webcodecs_main_av1_webgl Mean time across 100 frames: windows11-64-shippable-qr e10s fission stylo webrender 2.08 -> 1.27

Set release status flags based on info from the regressing bug 1936128

Bug 1936128 enables the feature only on the Nightly, so this won't impact the Beta 138 and Release 138.

Assignee: nobody → alwu
Severity: -- → S3
Flags: needinfo?(alwu)
Priority: -- → P2

NI myself for the further investigation.

Flags: needinfo?(alwu)

Summary Table (averaged per group)

Decoder/Config Avg. of Averages Avg. of Medians Avg. Stddev Avg. Stddev (%) Avg. Stddev Sans First
WMF with copy 26.12 25.30 4.12 15.52% 3.68
WMF without copy 24.20 23.97 1.60 6.74% 1.57
FFmpeg with copy 23.97 23.92 1.46 6.26% 1.33
FFmpeg without copy 25.83 24.91 2.27 9.10% 2.27

I performed some local testing for WMF decoder and FFmpeg decoder with/without video copying, and they seems not having too much difference.
If the regression is caused by using the ffmpeg, then my test should have significant difference as well, which makes me confused.

This graph shows the performance for AV1 and VP9 in the past 30 days. I would expect that the AV1 and VP9 performance should be similar, but surprisingly they weren't. One possible assumption is somehow we didn't use the hardware decoding before for AV1. I will do another test to see if it's true.


  • WMF decoder with copy
Average Median Stddev Stddev (%) Stddev Sans First Values
23.74 23.09 1.35 5.8% 1.56 23.7, 22.8, 26.1, 23.1, 23.1
22.87 23.14 1.62 7.0% 1.85 23.3, 23.1, 22.0, 20.8, 25.1
26.83 23.09 8.98 38.9% 10.11 23.2, 23.0, 23.1, 22.0, 42.9
34.61 36.42 7.31 20.1% 3.34 22.6, 33.4, 36.4, 40.4, 40.2
22.53 22.77 1.32 5.8% 1.52 22.8, 22.1, 23.6, 23.7, 20.5
  • WMF decoder without copy
Average Median Stddev Stddev (%) Stddev Sans First Values
22.66 22.51 0.46 2.0% 0.47 22.3, 23.2, 23.1, 22.5, 22.2
24.50 24.54 0.72 2.9% 0.74 23.9, 23.6, 25.4, 25.0, 24.5
23.53 23.59 1.52 6.4% 1.55 24.8, 23.6, 25.2, 21.7, 22.4
26.24 25.00 3.47 13.9% 2.02 31.6, 27.7, 23.6, 25.0, 23.2
24.07 24.23 1.81 7.5% 2.09 24.3, 24.2, 26.6, 23.7, 21.5
  • ffmpeg with copy
Average Median Stddev Stddev (%) Stddev Sans First Values
23.95 24.34 1.27 5.2% 0.85 22.1, 25.5, 23.4, 24.4, 24.3
23.82 22.42 2.53 11.3% 2.68 22.0, 27.4, 21.7, 22.4, 25.5
24.26 23.86 0.77 3.2% 0.75 25.0, 25.2, 23.6, 23.7, 23.9
23.58 24.10 1.37 5.7% 1.31 22.2, 25.3, 24.2, 22.1, 24.1
23.24 22.86 1.36 5.9% 1.05 25.0, 22.5, 21.6, 22.9, 24.1
  • ffmpeg without copy
Average Median Stddev Stddev (%) Stddev Sans First Values
23.92 23.67 1.49 6.3% 0.81 26.3, 24.2, 23.1, 22.3, 23.7
26.15 25.02 3.71 14.8% 3.63 22.6, 24.0, 32.2, 26.9, 25.0
23.97 23.84 1.47 6.2% 1.49 22.7, 25.4, 25.5, 22.4, 23.8
26.76 26.75 2.88 10.8% 3.33 26.8, 26.8, 31.2, 25.9, 23.2
24.37 24.29 1.79 7.4% 2.06 24.6, 27.2, 24.3, 23.2, 22.5
Flags: needinfo?(alwu)
Decoder/Config Avg. of Averages Avg. of Medians Avg. Stddev Avg. Stddev (%) Avg. Stddev Sans First
FFmpeg SW 9.77 9.41 0.61 6.52% 0.67

Okay, when using FFmpeg SW, the result is closer to the perf test on CI. As the test is using the canvas, I suppose capturing the hw buffer costs more time? If that is true, that could explain why using SW is way faster than HW.

Andrew, could you provide your thought here as well? This is not a real regression, it's just because of the difference between the SW and HW buffer. Maybe we should have both measurements for HW and SW? Thanks!


  • FFmpeg SW
Average Median Stddev Stddev (%) Stddev Sans First Values
9.59 9.64 0.33 3.4% 0.35 9.8, 9.1, 9.6, 9.4, 10.0
10.07 9.37 0.76 8.1% 0.82 9.6, 9.4, 9.7, 11.2, 10.4
9.78 9.50 0.67 7.1% 0.68 9.2, 9.6, 9.7, 9.5, 10.9
9.68 9.35 0.43 4.6% 0.50 9.7, 9.3, 9.2, 9.9, 10.3
9.73 9.20 0.86 9.4% 1.00 9.8, 8.9, 9.6, 11.1, 9.2
Flags: needinfo?(aosmond)

Set release status flags based on info from the regressing bug 1936128

It has been over 7 days with no activity on this performance regression.

:alwu, since you are the author of the regressor, bug 1936128, which triggered this performance alert, could you please provide a progress update?

If this regression is something that fixes a bug, changes the baseline of the regression metrics, or otherwise will not be fixed, please consider closing it as WONTFIX. See this documentation for more information on how to handle regressions.

For additional information/help, please needinfo the performance sheriff who filed this alert (they can be found in comment #0), or reach out in #perftest, or #perfsheriffs on Element.

For more information, please visit BugBot documentation.

Flags: needinfo?(alwu)

Per comment6, this is not a real regression. But we'd like to improve us test.

Flags: needinfo?(alwu)
You need to log in before you can comment on or make changes to this bug.