Closed Bug 1754870 Opened 4 years ago Closed 3 years ago

[sw-wr] Stepping through Google slide deck causes parent process to hang when SWGL is enabled

Categories

(Core :: Graphics: WebRender, defect)

x86
Windows
defect

Tracking

()

VERIFIED FIXED
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox97 --- unaffected
firefox98 --- unaffected
firefox99 + verified

People

(Reporter: cpeterson, Assigned: gw)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

[Tracking Requested - why for this release]:

@gw, I believe this bug is a regression from your fix for WR bounding rects bug 1749380. I bisected this regression to this push:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=8e9c1ffdeed84e460a9ba7f9e83ca24319a15c87&tochange=b7f88e5c537bb7d64ba9d4cb58b052650d510638

Steps to reproduce:

  1. Enable SW-WR (gfx.webrender.software = true) and restart Firefox.
  2. Load this Google slide deck: https://docs.google.com/presentation/d/1smDasBJzQqXH941iDKUuqfrLQFSCR7eNgnyCrf3JJyg/edit . It requires Mozilla LDAP access.
  3. Try stepping through the slides one by one using the arrow keys or Page Up/Down keys.

Result

Stepping through the slides is slow but takes less than one second... until you reach slide 11 (Equity x Design). Then stepping through slides 12+ hangs the tab or parent process for multiple seconds.

I can reproduce this bug with both 32- and 64-bit Firefox, but the problem seems more reproducible in 32-bit builds.

I also saw rect flashing problems with SW-WR in Google Docs today, but I don't know for sure that was caused by this same bug.

Flags: needinfo?(gwatson)

I'll see if I can repro this today.

If you see it happen again locally, would it be possible to break in to the process from gdb and get a callstack?

Did the build you were trying have this follow up patch (https://bugzilla.mozilla.org/show_bug.cgi?id=1754336) applied?

Flags: needinfo?(gwatson) → needinfo?(cpeterson)
Assignee: nobody → gwatson

(In reply to Glenn Watson [:gw] from comment #1)

If you see it happen again locally, would it be possible to break in to the process from gdb and get a callstack?

I'm on Windows and don't have a debugger.

Did the build you were trying have this follow up patch (https://bugzilla.mozilla.org/show_bug.cgi?id=1754336) applied?

I don't know. I can reproduce this problem in today's Nightly build (build ID 20220210065747). The fix for bug 1754336 was merged to mozilla-central early this morning UTC, but I don't know if it was before or after the Nightly build. I will retest in the next Nightly build.

Flags: needinfo?(cpeterson)

CC'ing Lee also, in case he has time to try and repro this and see if it's related to WR itself or something in SWGL exposed by the WR change - otherwise I'll take a look once I get a solution for https://bugzilla.mozilla.org/show_bug.cgi?id=1754809

I tried to reproduce this and haven't been able to so far (tried on both a Linux and a Windows machine).

Has Regression Range: --- → yes

(In reply to Chris Peterson [:cpeterson] from comment #2)

Did the build you were trying have this follow up patch (https://bugzilla.mozilla.org/show_bug.cgi?id=1754336) applied?

I tested today's Nightly build (build ID 20220211094209) that would include the fix for bug 1754336 and I can still reproduce the hangs. They seem to affect the slides with the black backgrounds (starting at slide 11) the most.

Here is a Firefox profile of the hangs as I step through the slides:

https://share.firefox.dev/3GILipx

It looks like all the time is spend inside the component transfer swgl fragment shader. It's possible that there is some kind of scaling issue where with my changes enabled the render target for that filter is much bigger than previously.

I'll check on Monday if there's any major differences in target sizes. If you have time, could you try take a screenshot on one of those slides with gfx.webrender.debug.render-targets enabled? That will let me see the size of the off-screen targets. Another thing for me to check on Monday - enabling gfx.webrender.debug.picture-caching will let me see if everything is red (invalidating and redrawing all the time, instead of using cached green tiles).

See Also: → 1754945

Fixed by backout. I will be re-landing this patch series next week, with changes that fix this issue. Please re-open if it occurs again.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

(In reply to Glenn Watson [:gw] from comment #6)

could you try take a screenshot on one of those slides with gfx.webrender.debug.render-targets enabled? That will let me see the size of the off-screen targets. Another thing for me to check on Monday - enabling gfx.webrender.debug.picture-caching will let me see if everything is red (invalidating and redrawing all the time, instead of using cached green tiles).

Here is a screenshot of a slow slide with gfx.webrender.software and gfx.webrender.debug.render-targets enabled.

Thanks Chris, that confirms what I hoped. With my fixes applied, there are 4x 2048x2048 targets allocated, which is vastly less than what you have in the screenshot there - so performance and memory usage should be much improved.

Target Milestone: --- → 99 Branch
Flags: qe-verify+

I managed to reproduce this issue on Nightly 99.0a1(20220210213101) on macOS 11 following the STR from the Description. On Firefox 99.0b6(20220320185956) and Nightly 100.0a1(20220321214243), there seems to be a major improvement compared to the affected builds, but there's still a noticeable delay present. I'm leaving a screen recording from Nightly 100.0a1(20220321214243). Should I treat this as the expected behaviour of the fix or re-open the bug? Thank you.

Screen Recording here.

Flags: needinfo?(gwatson)

I think it's probably expected behavior given the page complexity and swgl, but Lee or Chris might know better.

Flags: needinfo?(lsalzman)
Flags: needinfo?(gwatson)
Flags: needinfo?(cpeterson)

(In reply to Glenn Watson [:gw] from comment #11)

I think it's probably expected behavior given the page complexity and swgl, but Lee or Chris might know better.

I retested the STR from comment 0 and I think the current performance is acceptable. It's much better compared to the affected builds, so I don't think any more investigation is needed for this particular bug or Google slide deck.

Flags: needinfo?(cpeterson)

Thank you for letting me know! In this case I shall mark this as verified based on your comments.

Status: RESOLVED → VERIFIED
Flags: needinfo?(lsalzman)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: