Closed Bug 1945041 Opened 12 days ago Closed 4 days ago

49.96 - 2.11% google-slides powerUsage_gpu / google-slides PerceptualSpeedIndex + 6 more (Windows) regression on Wed January 29 2025

Categories

(Core :: Graphics: WebRender, defect, P1)

defect

Tracking

()

RESOLVED FIXED
137 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox134 --- unaffected
firefox135 --- unaffected
firefox136 + fixed
firefox137 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ahale)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression)

Attachments

(1 file)

Perfherder has detected a browsertime performance regression from push ba2f20565277d384d2b31b0f33556bad4f94b3e3. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
50% google-slides powerUsage_gpu windows11-64-shippable-qr cold fission webrender 1,185.29 -> 1,777.50 Before/After
37% google-slides powerUsage_gpu windows11-64-shippable-qr fission warm webrender 729.75 -> 1,003.21 Before/After
13% google-slides LastVisualChange windows11-64-shippable-qr cold fission webrender 1,971.21 -> 2,218.20 Before/After
11% google-slides LastVisualChange windows11-64-shippable-qr fission warm webrender 1,560.65 -> 1,730.00 Before/After
4% google-slides SpeedIndex windows11-64-shippable-qr fission warm webrender 428.30 -> 443.52 Before/After
3% google-slides PerceptualSpeedIndex windows11-64-shippable-qr fission warm webrender 450.69 -> 465.31 Before/After
2% google-slides SpeedIndex windows11-64-shippable-qr cold fission webrender 824.10 -> 842.18 Before/After
2% google-slides PerceptualSpeedIndex windows11-64-shippable-qr cold fission webrender 831.90 -> 849.48

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the patch(es) may be backed out in accordance with our regression policy.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run all of these tests on try with ./mach try perf --alert 43550

The following documentation link provides more information about this command.

For more information on performance sheriffing please see our FAQ.

If you have any questions, please do not hesitate to reach out to bacasandrei@mozilla.com.

Flags: needinfo?(ahale)

Hey Glenn, would you be able to set Priority and Severity? Thanks!

Flags: needinfo?(gwatson)

Set release status flags based on info from the regressing bug 1941838

Severity: -- → S2
Flags: needinfo?(gwatson)
Priority: -- → P1

I am investigating, this is not an expected result; the GPU should be rendering fewer pixels after the patch, and thus should use less power, however the result here is indeed a 50% increase in power usage on the Google Slides case, I can speculate that perhaps the size of the textures is less optimal for the GPU and so we might be dealing with partial clear vs full clear performance on textures and such, but instead I can see an increase in memory usage in the GPU Process which may hint at increased pixel count rather than decreased.

The bug is marked as tracked for firefox136 (beta). However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)

Ashley is investigating according to comment 3

Assignee: nobody → ahale
Flags: needinfo?(bhood)

I have two things I'm doing for this issue in the immediate term:

  • I'm going to put the logic back the way it was as much as possible as an experiment but keep the comments that explain this code, which should make it not worse than before, but won't explain what's going wrong.
  • Continue debugging what changed in terms of computed rect sizes with some adhoc debugging prints or profiler markers in local testing.
Pushed by ahale@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a6036ce2deb2 restore the previous surface size limiting logic and add a profiler marker r=gfx-reviewers,lsalzman
Status: NEW → RESOLVED
Closed: 4 days ago
Resolution: --- → FIXED
Target Milestone: --- → 137 Branch

(In reply to Pulsebot from comment #8)

Pushed by ahale@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a6036ce2deb2
restore the previous surface size limiting logic and add a profiler marker
r=gfx-reviewers,lsalzman

With this, the numbers are back to pre-regression.

:ahale could you please add a beta uplift request on this when you're ready?

Comment on attachment 9464259 [details]
Bug 1945041 - restore the previous surface size limiting logic and add a profiler marker r?gw,#gfx-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined/Reason for urgency: Patch addresses is a significant regression in power usage on Google Slides when the page thumbnails are updated.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Behavior is restored to pre-regression behavior. Code is restructured but the math is identical.
  • String changes made/needed:
  • Is Android affected?: Unknown
Flags: needinfo?(ahale)
Attachment #9464259 - Flags: approval-mozilla-beta?
Blocks: 1947382

(In reply to Pulsebot from comment #8)

Pushed by ahale@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a6036ce2deb2
restore the previous surface size limiting logic and add a profiler marker
r=gfx-reviewers,lsalzman

Perfherder has detected a browsertime performance change from push a6036ce2deb2aa753b00ce4182199f93a9522dbd.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
34% google-slides powerUsage_gpu windows11-64-shippable-qr bytecode-cached cold fission webrender 1,697.95 -> 1,126.57
33% google-slides powerUsage_gpu windows11-64-shippable-qr bytecode-cached fission warm webrender 975.76 -> 652.02
32% google-slides powerUsage_gpu windows11-64-shippable-qr cold fission webrender 1,766.53 -> 1,196.25
25% google-slides powerUsage_gpu windows11-64-shippable-qr fission warm webrender 1,026.05 -> 773.32
12% google-slides LastVisualChange windows11-64-shippable-qr cold fission webrender 2,224.04 -> 1,964.54
... ... ... ... ... ...
2% google-slides PerceptualSpeedIndex windows11-64-shippable-qr bytecode-cached cold fission webrender 851.61 -> 834.09 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 43785

For more information on performance sheriffing please see our FAQ.

Comment on attachment 9464259 [details]
Bug 1945041 - restore the previous surface size limiting logic and add a profiler marker r?gw,#gfx-reviewers

Approved for 136.0b5

Attachment #9464259 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: