Closed Bug 1566099 Opened 5 months ago Closed 19 days ago

Artifacts appearing around signup text boxes on github.com Vega GPUs

Categories

(Core :: Graphics: WebRender, defect, P3)

70 Branch
x86_64
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- unaffected

People

(Reporter: dalley, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

System Details:

Firefox Nightly 70, 2019-07-13
Windows 10
AMD Vega 56 w/ driver 19.7.1
gfx.webrender.all = True

I navigated to www.github.com and selected the "username" text box to sign up, and began typing.

I was trying to reproduce a different problem that had been posted on reddit [0] but found a different issue instead (and was not able to reproduce the original one).

[0] https://www.reddit.com/r/firefox/comments/cbwwop/webrender_testing_help_wanted/

Actual results:

The halo/highlight around the text box would get white artifacts that flickered into and out of existance. They seemed to occur more frequently when typing into the text box, but they also occasionally happened without any keyboard or mouse activity.

Video: https://media.giphy.com/media/Y1NANdLhYRfZBVFM5a/giphy.gif

They most frequently appeared on the left side of the text box, but did also appear on other parts of the text box.

The "password" and "email" text boxes had the same problem, not just the "username" one.

I attempted to reproduce on other text boxes on github.com by e.g. writing an issue, writing a comment, but did not see the same problem elsewhere.

Expected results:

No artifacts

Component: Untriaged → Graphics: WebRender
Product: Firefox → Core
See Also: → 1565910

Can you still reproduce this problem?

Flags: needinfo?(dalley)

As of yesterday, yes. I don't know if something changed in the last 24 hours but I can check again a little bit later.

Flags: needinfo?(dalley)

So far I've not been able to reproduce this bug with Win10 1903, Radeon RX480, driver 19.7.2.

Please try to find a regression range:
https://mozilla.github.io/mozregression/install.html
mozregression --good 2019-06-15 --bad 2019-07-15 --pref gfx.webrender.all:true -a https://github.com

Blocks: wr-69
Priority: -- → P3

I was able to reproduce this just now on the latest nightly, with the same driver version (19.7.2). It does seem less frequent than what I was experiencing earlier and what was captured in the video. It also seems like it's worse when the page is freshly loaded than when it has been sitting for a while. But those could both be coincidences, I don't know for sure.

If you aren't experiencing it, it might be a driver bug specific to Vega GPUs. But, I don't know anything about graphics drivers or webrender internals so I can't really make that judgement.

I will look into mozregression, but truth be told I don't really have time to set up a Firefox build environment on Windows and then do a full bisection, compiling repeatedly... I might get to it later this week.

(In reply to dalley from comment #4)

I will look into mozregression, but truth be told I don't really have time to set up a Firefox build environment on Windows and then do a full bisection, compiling repeatedly... I might get to it later this week.

Note, mozregression does not require setting up a build environment and building repeatedly. We have an archive of pre-built binaries, and it just downloads from there.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

I could not reproduce your issue on AMD FX 8320 + AMD Radeon RX 550 with Windows 10 on Nightly v70.0a1 with WebRender enabled.

Dalley, we need your help to find the push that caused this issue to appear. This is what you need to do:

  1. Download mozregression GUI: https://github.com/mozilla/mozregression/releases/download/gui-0.9.39/mozregression-gui.exe
  2. You will use mozregression app to "bisect" builds that reproduce the issue and builds that do not reproduce it in search of the one build/changeset that introduced the issue, in the first place:
    a. Open mozregression-gui.exe
    b. Click "File" -> "Run a new bisection"
    c. On "Basic configuration" screen, select Build Type: "opt" and click "Next" button.
    d. On the "Profile selection" screen click on "Add preference" and write the name as "gfx.webrender.all" and set it as true, then click "Next".
    e. On the Bisection wizard screen, you will need to select a build that reproduces the issue and one that does not:
    e1. In the "Last known good build:" section, select "date" on the right drop-down and set this date: 2019-06-15
    e2. In the "First known bad build:" section, select "release un the right drop-down and set this date: 2019-07-15
    f. Click "Finish" to start the bisection process.
    g. builds will open one-by-one, you will need to test each one of them and see whether the issue reproduces. If it reproduces, then you need to select the "bad" button in the mozregression window and if not, you need to select the "good" button.
    h. When bisection is done, you will have the information in the "Log View" section of the mozregression window; bisection may also fail due to not enough builds, but the logs can always be useful.
  3. Copy the logs in a text file and attach it to this bug.

If there is still information you need regarding the regression process, please request information from me.
Thank you for your contribution!

Flags: needinfo?(dalley)
Attached file mozregression_log.txt

Logs attached

Flags: needinfo?(dalley)

@Bodea, thanks for your clarification, that was really helpful.

Assignee: nobody → gwatson
Regressed by: 1558106
Has Regression Range: --- → yes
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

I've been unable to reproduce this locally, so far. Jeff, any chance someone in Toronto could try and repro on some of the hardware you have there?

Flags: needinfo?(jmuizelaar)

Not sure if this is at all relevant, but FWIW, I'm also using a 144hz Freesync monitor. That is another potential difference between my setup and yours.

I'll try locking it to 60hz and see if that makes a difference.

No difference.

Jeff Gilbert, you have a Vega56, can you see if you can reproduce this problem. If you can't you can also ensure that you're using the same driver dalley and see if that shows the problem.

Flags: needinfo?(jmuizelaar) → needinfo?(jgilbert)

I will.

19.7.3 is also affected, in case you've already upgraded to the newest version. No need to downgrade to 19.7.2

I can repro on Vega56 19.7.1/26.20.13001.9005 (7-4-2019), 90Hz FreeSync @1440p.

Flags: needinfo?(jgilbert)

Jeff and dalley what devicePixelRatio are you running at? You can find out by typing devicePixelRatio in the browser console.

Flags: needinfo?(jgilbert)

I'm unable to reproduce on a RX460 with 24.20.13001.1010.

Also can not reproduce with RX460 and 26.20.13001.9005. This suggests that it might be Vega specific?

bug 1446685 looked similar in the past.

jgilbert and gw are going to have a go at debugging this.

Can reproduce on Vega 56 with Windows 10, today's Nightly, Adrenalin 19.7.5 driver.

I'm at devicePixelRatio 1.0 on this machine.

Flags: needinfo?(jgilbert)

If it happens with picture-caching disabled, it's too fast for me to catch/see.

devicePixelRatio=1

And I agree, disabling picture caching does "fix" the issue, at least from what I can tell.

(RENDERDOC_HOOK_EGL=0 was needed)

While it happens frequently during normal execution, it seems harder to repro in RenderDoc. I have only one apparently successful capture based on the preview snapshot, but the artifact does not appear during capture analysis/replay.

We can reproduce this in Toronto now. We were able to get a wrench capture but it wasn't that helpful because the problem went away on frame rebuild. We're going to try to get an apitrace.

apitracing webrender with angle/d3d works but is very slow. We seem to be spending most of our time hashing the 1MB streaming vertex buffer. https://github.com/apitrace/apitrace/blob/0d6e945cacb6eb0002009bf69694fdf569c6b05e/wrappers/memtrace.cpp#L183

Assignee: gwatson → dmalyshau

Hi Dzmitry, we've got one Beta left before Fx69 goes to RC next week. Is this something we can realistically fix before we ship at this point?

Flags: needinfo?(dmalyshau)

It doesn't look that way, Ryan.

Flags: needinfo?(dmalyshau)
Blocks: wr-70
No longer blocks: wr-69

I was able to get an apitrace capture of this bug, unfortunately the problem does not happen deterministically when replaying the capture.

Summary: Artifacts appearing around signup text boxes on github.com → Artifacts appearing around signup text boxes on github.com Vega GPUs

I'm not personally able to reproduce this anymore on recent nightly. I'm not certain that it's gone entirely, but I haven't seen it.

(In reply to dalley from comment #34)

I'm not personally able to reproduce this anymore on recent nightly. I'm not certain that it's gone entirely, but I haven't seen it.

Did you reenable picture caching?

Flags: needinfo?(dalley)

Yes, I've had the tab off to the side to test every couple of days after updating and I was still seeing the artifacts the last couple of times I have tried, apart from today.

We've established that it's tricky to reproduce sometimes though... if it still looks fine in a day or two I can try to bisect again.

Flags: needinfo?(dalley)

I opened nightly again today (first time since my last comment) and still had no problems with that version. However after updating to today's nightly the problem came back. I used mozregression to do some more bisections.

The bug was fixed by one of the 4 changes in this stack (I don't really know the correct Mercurial/Phabricator lingo):

https://phabricator.services.mozilla.com/D44582

The bug was then reintroduced when those same changes were reverted 2 days later:

2019-09-11T23:16:19: DEBUG : Found commit message:
Backed out 4 changesets (bug 1578576) for causing build failure with microsoft visual studio 2019. a=backout
 
Backed out changeset e5b3436fc277 (bug 1578576)
Backed out changeset cd2799d2d190 (bug 1578576)
Backed out changeset 13282d7a47a5 (bug 1578576)
Backed out changeset 3064469c073d (bug 1578576)

I've spoken with AMD about this issue and it appears to be a regression in their drivers and they are looking into it.

Dalley, if you update to the latest drivers from AMD do you still see this?

Flags: needinfo?(dalley)

I'm going to be 4,500 miles away from my windows desktop for the next week, so I'll have to check afterwards.

Note that the commits I mentioned (which updated ANGLE) also definitely fixed the problem on my machine during the period before they got backed out, so if they ever got pushed back in then the issue is likely fixed (for me at least) anyways.

Flags: needinfo?(dalley)
Blocks: wr-71
No longer blocks: wr-70
No longer blocks: wr-71

@Jeff, Using the same AMD driver I had from 6 weeks ago, I can no longer reproduce on latest nightly, so it looks like the ANGLE update that I mentioned earlier fixed it.

For good measure, I tried with the new 19.11.1 driver, which also works fine.

Unassigning from myself. It's great to know the issue is fixed. I suppose the next (and possibly final) step would be to find out exactly what driver revisions fix this.

Assignee: dmalyshau → nobody
Status: ASSIGNED → NEW

Marking as WFM per comment #41

Status: NEW → RESOLVED
Closed: 19 days ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.