Closed Bug 1598081 Opened 5 years ago Closed 4 years ago

UI Corruption

Categories

(Core :: Graphics: WebRender, defect, P3)

71 Branch
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: milostodorovic, Unassigned, NeedInfo)

References

Details

Attachments

(3 files)

Attached image SS1.png

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0

Steps to reproduce:

I'm unable to consistently reproduce this, but on 71 branch(71.0b11 atm) issue persists until the mouse is moved over affected rendering area and then it clears out. On earlier versions (it showed up around say 71.b5 or b6) it cleared on its own in less than 2 seconds.

Actual results:

It looks like some buffer is getting corrupted, portions of Firefox UI(not only the webpages) turn black, fonts disappear , images glitch, etc. It happens on average once every 2 days or so.

Attached image SS2.png

issue persists until the mouse is moved

I should prolly rephrase this, issue persists until the area is changed in any way(mouse movement, scrolling, animation)

Please do the following:

  1. Enter about:support into the address bar.
  2. Click the Copy text to clipboard button.
  3. Paste the clipboard contents into a text editor like Notepad, then save the file.
  4. Click the Attach New File button above the description here to upload it.
Component: Untriaged → Graphics
Flags: needinfo?(milostodorovic)
Product: Firefox → Core
Attached file ffdump.txt
Flags: needinfo?(milostodorovic)
Component: Graphics → Graphics: WebRender

Attachment 9110337 [details] had the following error log. Unexpected thing seemed to happen around WebRender..

(#0): GP+[GFX1-]: Failed to lock ExternalImage for extId:124554143576

Blocks: wr-71
Priority: -- → P3

Are you using multiple monitors, out of curiosity?

Flags: needinfo?(milostodorovic)

Yes, I've got a dual monitor setup(1920x1080@60hz, 1440x900@74hz) . Though the browser is usually maximised on my primary screen.

Flags: needinfo?(milostodorovic)

Can you take a peak at this Gankra? I actually saw something similar on bwinton's machine that he said happened after plugging in another monitor

Flags: needinfo?(a.beingessner)

Just got a usable capture of this happenning on google calendar, looking at it now.

Flags: needinfo?(a.beingessner)

hmm, not sure this capture is actually useful. At least in the capture I have, the corruption is definitely at the frame level (as opposed to the scene). It looks like the corrupted data is getting checked into the picture cache, so whatever went wrong is "done" and we're just replaying the result over and over.

It's unclear if we just drew it bad and cached it, or if the cache itself got corrupted, or ..?

I also don't have a good reproduction methodology. It was fairly random :/

I can share the capture, but I don't think it has anything useful in there. (also the bugzilla attachment feature seems broken right now)

Perhaps this is just a variant of the new version of Bug 1543356? In that case a rendering hiccup would get clobbered by resizing invalidating the world, but if it other things could trigger it, it might stick and look like this Bug?

I removed an old HDD last night that had page files enabled and I've had these corruption events multiple times per hour now and multiple crashes in the meantime(the removed drive was a storage drive and I don't have page files enabled on any other drive, AFAIK it shouldn't affect Firefox in any other way). Here are the Report IDs for crash reports from today:

bp-9bd9c36e-68d4-428f-8ce3-fdfee0191124
bp-fea818c6-878e-4b9c-9a2d-024cb0191124
bp-6d98b621-b49a-4328-89d8-b2d5d0191124
bp-a8275560-e3ae-411f-be08-198ed0191124
bp-8fcc5023-8ae8-46b2-9298-9a53c0191124
bp-66e9c8fb-e6fc-4c5d-b860-ba3810191124
bp-638a5f20-820a-4380-b2c5-ebb9e0191124
bp-5b5b038f-7d50-4cd9-9596-94d1f0191124

See Also: → 1600357
Blocks: wr-72
No longer blocks: wr-71

Can we try and get more captures for this?

Flags: needinfo?(a.beingessner)

(In reply to milostodorovic from comment #11)

I removed an old HDD last night that had page files enabled and I've had these corruption events multiple times per hour now and multiple crashes in the meantime(the removed drive was a storage drive and I don't have page files enabled on any other drive, AFAIK it shouldn't affect Firefox in any other way). Here are the Report IDs for crash reports from today:

These all look like out-of-memory crashes. Do you think you could get a about:memory report and attach it to the bug from a similar situation?

I don't think I've seen this since (just got super lucky), but also I still believe captures don't have any useful information for this bug. Whatever went wrong is just having its effects redrawn, with all evidence of what actually happened being lost.

Flags: needinfo?(a.beingessner)

Glenn, NI-ing you here too because this seems similar to 1600357, we will continue trying to find out more info

Flags: needinfo?(gwatson)

It does seem like the failure to lock external images might be relevant here.

Sotaro, can we add any additional logging information when failing to lock an external image? Is there any error codes or similar available that we could report / investigate?

Flags: needinfo?(gwatson)
Flags: needinfo?(sotaro.ikeda.g)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #13)

It seems those few reports were caused by a kernel corruption(it seems applications couldn't access more than 2gb~ of RAM for some reason after I got rid of the old drive), I had the OOM issue spread to other applications after few hours so I had to reimage my drive, now I'm back to 1-2 corruption events per day. Interestingly, the kernel issue did exacerbate the problem since it made the corruptions occur every half an hour or so on average, could we use this to narrow it down in any way?

Should I still grab the memory report after the corruption?

(In reply to Glenn Watson [:gw] from comment #16)

Sotaro, can we add any additional logging information when failing to lock an external image? Is there any error codes or similar available that we could report / investigate?

Current gecko does not have another failure log. I looked into the code and found one possibility. With WebRender, MemoryPressure is not handled correctly. When MemoryPressure happened WR releases all resources.
https://searchfox.org/mozilla-central/rev/690e903ef689a4eca335b96bd903580394864a1c/gfx/wr/webrender/src/render_backend.rs#1103.

MemoryPressure event is delivered to WR like the following sequence.

MemoryPressureObserver::Observe()
->gfxPlatform::OnMemoryPressure()
->CompositorManagerChild::SendNotifyMemoryPressure()
->CompositorManagerParent::RecvNotifyMemoryPressure()
->CompositorBridgeParent::NotifyMemoryPressure()
->WebRenderAPI::NotifyMemoryPressure()
->wr_api_notify_memory_pressure()
->RenderApi.notify_memory_pressure()
->RenderBackend::process_api_msg()
->// ApiMsg::MemoryPressure message handling

Blocks: wr-ui-glitch
No longer blocks: wr-72

Can you confirm which specific version of Win10 you are using?

Flags: needinfo?(milostodorovic)

(In reply to Jessie [:jbonisteel] plz needinfo from comment #19)

Can you confirm which specific version of Win10 you are using?

Win 10 Enterprise LTSC 1809, build 17763.864

Flags: needinfo?(milostodorovic)

There is a new flag available in the next nightly build (and only available in nightly builds) called gfx.webrender.panic-on-gl-error that can be set in about:config. After changing this value, a restart is required before it takes effect.

When this flag is set, any time the GPU driver reports a GL error, we will detect this and panic (controlled crash) the entire GPU process. It shouldn't take the entire browser down, just the GPU process (I believe the GPU process is enabled on Windows and Linux, not sure about Mac).

If you see the bug occur while that is active, and then restart the browser, the logs from the GL error should be visible in about:support.

If you see the glitch occur while that option is active, we can infer a few things:

  • If there is no GPU process crash / output logs, then no GL error is being reported (likely signals a driver bug).
  • If there is a GPU process crash, the logs should give us a clue as to what is occurring (even if nothing is logged, it would still be a clue there is a GL error occurring).

Ni-ing the reporter so they see comment 21 and can give that a while (available to try in Firefox Nightly)

Flags: needinfo?(milostodorovic)

Also, milostodorovic can you try restoring the 3d settings under global settings in the "NVIDIA Control Panel" application?

(In reply to Jeff Muizelaar [:jrmuizel] from comment #23)

Also, milostodorovic can you try restoring the 3d settings under global settings in the "NVIDIA Control Panel" application?

I'm using defaults, I've restored them again for good measure though.

(In reply to Jessie [:jbonisteel] plz needinfo from comment #22)

Ni-ing the reporter so they see comment 21 and can give that a while (available to try in Firefox Nightly)

Running nightly, I'll update once the issue crops up again.

Flags: needinfo?(milostodorovic)

milostodorovic - just one more request today :)

when you encounter the problem, could you try pressing Ctrl-Shift-3 ? This should generate a wr-capture folder in your AppData\Local windows folder (for example C:\Users\you\AppData\Local\wr-capture). Then please zip and share the contents of that. This again will help us figure out what is going on

Thanks for your help and patience as we try to sort through this tricky bug!

Flags: needinfo?(milostodorovic)

Will do!

I haven't seen the issue crop up yet though(I'd expect to see it at least once by now), I'll switch back to dev/Aurora over the weekend just to confirm its still happening after resetting 3D settings.

No problem! Thank you folks for all the hard work you do!

Flags: needinfo?(milostodorovic)

Issue does not seem to occur on nightly branch at all, its still happening on aurora though.

When you see it happen on aurora, have you been able to grab a capture as per the instructions in comment 25 https://bugzilla.mozilla.org/show_bug.cgi?id=1598081#c25 ?

Flags: needinfo?(milostodorovic)

I'm not getting anything output on aurora(no directory is created), command works on nightly though.

Flags: needinfo?(milostodorovic)

I am curious - in aurora, are you able to find a consistent way to reproduce the issue?

Flags: needinfo?(milostodorovic)

As in Attachment 9110337 [details], error log was overflowed by a log from wr_renderer_lock_external_image(). Bug 1607129 is going to reduce it.

Hello Milostodorovic, I was just wondering if you have been using the new beta (73) at all and if you have seen the UI glitch there?

We suspect that we have fixed this bug in Firefox 75 (shipping this week). Suspected fix is https://bugzilla.mozilla.org/show_bug.cgi?id=1617083

If this issue starts occurring again, we will reopen this issue.

Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Flags: needinfo?(sotaro.ikeda.g)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: