Closed Bug 1721098 Opened 3 years ago Closed 3 years ago

Pink (software-wr) / black (WebRender) blocks on Twitch with multiple displays (NVIDIA GeForce GTX 1070)

Categories

(Core :: Graphics: WebRender, defect)

Firefox 90
Desktop
Windows 10
defect

Tracking

()

VERIFIED FIXED
92 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox90 --- wontfix
firefox91 + verified
firefox92 --- verified

People

(Reporter: yoasif, Assigned: sotaro)

References

(Regression)

Details

(Keywords: regression)

Attachments

(6 files, 2 obsolete files)

Attached image image.png

From https://www.reddit.com/r/firefox/comments/omvc37/flashing_colored_boxes_when_not_using_hardware/h5ogdq1/

Could be reliably triggered by having multiple twitch windows open across 2 monitors and quickly rolling over chat elements, or by loading some other sort of resource-intensive webpage like a wiki with lots of animated elements and quickly scrolling.

With hardware acceleration on, the blocks were black instead.

Force disabling WR and using "Basic" compositor resolves the issue.

Has STR: --- → yes
Keywords: regression
See Also: → 1697559

:yoasif, could you try to find a regression range using for example mozregression?

See Also: → 1721099

I've managed to reproduce the issue on Windows 10 with NVidia GTX1050 video card, and then investigated for the change that made this behavior possible:
2021-07-19T18:31:17.747000: DEBUG : Found commit message:
Bug 1709493 - Don't call compositor begin_frame/end_frame unless actually rendering. r=sotaro,gfx-reviewers,gw

If we're not actually rendering a frame, calling begin_frame/end_frame on the compositor without
adding surfaces can cause us to render a blank frame. Avoid calling begin_frame/end_frame as well
in this situation so that we don't accidentally do this.

Differential Revision: https://phabricator.services.mozilla.com/D114316

2021-07-19T18:31:17.747000: DEBUG : Did not find a branch, checking all integration branches
2021-07-19T18:31:17.747000: INFO : The bisection is done.
2021-07-19T18:31:17.747000: INFO : Stopped

Confirmed and reg window provided.

Regressed by: 1709493
Hardware: Unspecified → Desktop
Has Regression Range: --- → yes

Sotaro, is there some invalidation/clearing that needs to be happening here that is not getting properly done for this case? The purple color is suspicious as a couple shaders use that to signal something has been passed invalid data like brush_mix_blend?

Flags: needinfo?(sotaro.ikeda.g)

I could reproduce the black boxes with multiple displays with WebRender (Software) on Win10 PC. I am going to look into the problem. I did not see the problem with one display.

I also god the same regression range with mozregression.


13:17.37 INFO: No more integration revisions, bisection finished.
13:17.37 INFO: Last good revision: 00b9154f4315be73495d44b18af4d4b1a1a5b5ec
13:17.38 INFO: First bad revision: ec86658fae7ff91d35d4f36d6c120d9e5f444093
13:17.38 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=00b9154f4315be73495d44b18af4d4b1a1a5b5ec&tochange=ec86658fae7ff91d35d4f36d6c120d9e5f444093

I did not see pink blocks with WebRender (Software) instead I saw black blocks with latest nightly and Firefox 90.

Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(sotaro.ikeda.g)

I tested the problem with WebRender (Software) on 3 Win10 PCs, the problem happened on the all PCs. But when I enabled hardware WebRender, I only saw a rendering problem of Bug 1638709 on one PC with nvidia GPU.

With D120451 applied, I did not see the problem.

self.reset_overlaps() call in end_frame() seemed to trigger the problem. SwCompositor::flush_composites() uses overlap info. flush_composites() could be called outside of between begin_frame() and end_frame().

Flags: needinfo?(sotaro.ikeda.g)

:lsalzman, can you comment to comment 11?

Flags: needinfo?(lsalzman)

(In reply to Sotaro Ikeda [:sotaro] from comment #12)

:lsalzman, can you comment to comment 11?

This shouldn't matter unless add_surface() is getting called. If there are no surfaces to render, then flush_composites() will do nothing and just return.

The only way I could see something like this being an issue is if add_surface() was getting called not between an enclosing begin_frame()/end_frame()?

Flags: needinfo?(lsalzman)

Given multiple reports (see the dupes) and the appearance of the issue on popular URLs, assigning S2.

Severity: -- → S2
See Also: 1697559, 1721099

I was experiencing the same issues with Twitch chat causing black boxes to appear.

However, the first place I was experiencing this was in the developer tools. If I went to a site with lots of elements on it, opened the developer tools, and then moused up and down over elements in the Inspector tab, it would cause the same issues within the developer tools window.

Just confirming that setting gfx.webrender.force-disabled to true fixed both cases for me.

The only way I could see something like this being an issue is if add_surface() was getting called not between an enclosing begin_frame()/end_frame()?

With Attachment 9233042 [details] [diff], add_surface() was always called between begin_frame()/end_frame() when I tested and the problem happened.

Attachment #9233040 - Attachment is obsolete: true

Add log, to check if add_surface() was getting called not between an enclosing begin_frame()/end_frame().

D120823 addressed the problem for me. It seems that invalid is related to the problem.

Tracking for 91 because of the number of dupes

:lsalzman, can you comment to comment 23?

Flags: needinfo?(lsalzman)

(In reply to Sotaro Ikeda [:sotaro] from comment #25)

:lsalzman, can you comment to comment 23?

Just call reset_overlaps() instead of reset_invalid(). Shouldn't need to add a new function here.

Flags: needinfo?(lsalzman)
Attachment #9233059 - Attachment description: Bug 1721098 - Reset invalid in begin_frame() → Bug 1721098 - Call reset_overlaps() in begin_frame()
Assignee: nobody → sotaro.ikeda.g
Status: NEW → ASSIGNED

The patch print log when tile invalid flag is set in end_frame(). And I confirmed that it was common with software WebRender.

If invalidate_tile() is called not between an enclosing begin_frame()/end_frame(), the tile invalid flag could exist as true in next begin_frame()/end_frame(). It causes the problem.

Pushed by sikeda.birchill@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7e3613503253 Call reset_overlaps() in begin_frame() r=lsalzman
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

Comment on attachment 9233059 [details]
Bug 1721098 - Call reset_overlaps() in begin_frame()

Beta/Release Uplift Approval Request

  • User impact if declined: Blocky rendering artifacts on some major sites in the world.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This just adds some extra initialization code to ensure rendering happens in a consistent state at the beginning of a frame. This doesn't really add any new rendering behavior.
  • String changes made/needed:
Attachment #9233059 - Flags: approval-mozilla-beta?

Comment on attachment 9233059 [details]
Bug 1721098 - Call reset_overlaps() in begin_frame()

Approved for our last beta given the impact on a major site and multiple duplicates reported, thanks.

Attachment #9233059 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]
Flags: qe-verify+

To properly verify this issue I am supposed to reproduce this issue on an affected before verifying it on a fixed version, however, I was not able to reproduce this issue again in neither Nightly (v92.0a1 from 2021-07-25), not Beta (v91.0b2), from before the fix, on the same system.

This being said, I assume that something may have already changed on the Twitch side before getting to properly verify this bug.
In any case, the original issue did not reproduce after many attempts on Nightly v92.0a1 from 2021-08-08 or Beta v91.0 RC.

Asif Youssuff, is there any chance you could confirm that the issue is no longer reproducing with one of your bug sources?

Status: RESOLVED → VERIFIED
Flags: qe-verify+ → needinfo?(yoasif)
OS: Unspecified → Windows 10
See Also: → 1722716
Attachment #9232323 - Attachment is obsolete: true
Flags: needinfo?(yoasif)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: