Closed Bug 1768397 Opened 3 years ago Closed 2 years ago

GPU Process hang and crash when browsing Imgur.com in Nightly with Radeon

Categories

(Core :: Graphics, defect, P1)

x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox100 --- unaffected
firefox101 + disabled
firefox102 + fixed

People

(Reporter: lh.bennett, Assigned: sotaro)

References

(Blocks 2 open bugs, Regression, )

Details

(Keywords: crash, hang, regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0

Steps to reproduce:

This is probably filed somewhere, even though I can't find exactly what bug it is.

Over the last couple of weeks, browsing on Imgur.com can cause a GPU hang and crash.

Actual results:

Visit Imgur.com and browse, even click on different image/video pages. Occasionally the GPU Process will hard lock, CPU usage will climb to about 30% and will stay there. Controls will no longer be functional. I can get it to auto recover if I resize the window. Otherwise, it will stay that way for as long as it's active.

Even worse, sometimes it can take the driver instance with it, causing the OS to terminate the graphics driver runtime and recover.

Expected results:

The GPU process shouldn't hard lock and prevent nearly all interaction with the window.

OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64

The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Graphics
Product: Firefox → Core

Are you able to capture a profile using the "Graphics" setting in the profiler? See https://profiler.firefox.com/. Thanks!

Flags: needinfo?(lh.bennett)

Also, are there any crash reports under about:crashes for the GPU process?

Also, just a thought, bug 1763280 landed recently which affected video performance and your driver is affected. If you flip media.wmf.no-copy-nv12-textures-force-enabled to true, does it make a difference?

Breakpad does not catch the hang of GPU Process. I suspect It's because of the recovery of the instead of a full termination.

I will try those suggestions and report back.

I got lucky today, not only did Breakpad catch a crash this time, but it took out the driver, forcing a restart which killed the profiler. Tried again and got another profile.

https://share.firefox.dev/39SaSxL

https://crash-stats.mozilla.org/report/index/dea967c2-21b3-4340-b0cc-e349d0220509

Flags: needinfo?(lh.bennett)

Just to note, I cannot reproduce this crash with pref 'media.wmf.no-copy-nv12-textures-force-enabled' set to true.

(In reply to Leman Bennett [Omega] from comment #8)

Just to note, I cannot reproduce this crash with pref 'media.wmf.no-copy-nv12-textures-force-enabled' set to true.

Awesome, thanks.

Severity: -- → S3
Flags: needinfo?(sotaro.ikeda.g)
Priority: -- → P3
Regressed by: 1763280

(In reply to Leman Bennett [Omega] from comment #7)

I got lucky today, not only did Breakpad catch a crash this time, but it took out the driver, forcing a restart which killed the profiler. Tried again and got another profile.

https://share.firefox.dev/39SaSxL

https://crash-stats.mozilla.org/report/index/dea967c2-21b3-4340-b0cc-e349d0220509

Unfortunately that crash is in the content process, and the profiler recording only captured the tail end of the graphics threads. Probably not enough information. Would you mind retrying? Thanks!

Has Regression Range: --- → yes

Set release status flags based on info from the regressing bug 1763280

(In reply to Andrew Osmond [:aosmond] (he/him) from comment #10)

(In reply to Leman Bennett [Omega] from comment #7)

I got lucky today, not only did Breakpad catch a crash this time, but it took out the driver, forcing a restart which killed the profiler. Tried again and got another profile.

https://share.firefox.dev/39SaSxL

https://crash-stats.mozilla.org/report/index/dea967c2-21b3-4340-b0cc-e349d0220509

Unfortunately that crash is in the content process, and the profiler recording only captured the tail end of the graphics threads. Probably not enough information. Would you mind retrying? Thanks!

I see what's going on. Unfortunately, I need some advice to get a capture. The browser has to be responsive in order to use the profiler. When the GPU Process locks to an unresponsive 30% CPU, I cannot interact with the browser at all. Pressing CTRL+SHIFT+2 does nothing until I kill the GPU Process, which erases that part of the capture. I tried using WinDBG, but it did not go well. I also tried waiting it out, but the profiler exits with an OOM error after the GPU Process recovers.

Also, I kept a version of GPU-Z active just to see if the GPU is affected, and it stays completely nominal. UVD clocks will spike on occasion, but that's due to the videos loading. Without any activity, the clocks are flat.

Bug 1763280 seems not to related to the regression. Bug 1763280 enables zero video frame copy only on Intel GPU. Reporter's GPU was AMD Radeon.

(In reply to Leman Bennett [Omega] from comment #8)

Just to note, I cannot reproduce this crash with pref 'media.wmf.no-copy-nv12-textures-force-enabled' set to true.

If it addressed the problem. Bug 1766282 and Bug 1767212 might address the problem. Pref is going to be changed to 'media.wmf.zero-copy-nv12-textures-force-enabled'.

Flags: needinfo?(sotaro.ikeda.g)
Depends on: 1767212
Blocks: video-perf

(In reply to Sotaro Ikeda [:sotaro] from comment #13)

Bug 1763280 seems not to related to the regression. Bug 1763280 enables zero video frame copy only on Intel GPU. Reporter's GPU was AMD Radeon.

I wonder if Bug 1758601 could be an actual regression bug.

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true

Given that Fx101 goes to RC in a week, is there something we should be considering preffing off on Beta before then?

Flags: needinfo?(sotaro.ikeda.g)

Changing the priority to P1 as the bug is tracked by a release manager for the current beta.
See Triage for Bugzilla for more information.
If you disagree, please discuss with a release manager.

Priority: P3 → P1

(In reply to Ryan VanderMeulen [:RyanVM] from comment #17)

Given that Fx101 goes to RC in a week, is there something we should be considering preffing off on Beta before then?

Video overlay is enabled only on nightly on non-Intel GPU on current m-c. It does not go to beta.

Do we need to preffing it off also in this case?

Flags: needinfo?(sotaro.ikeda.g) → needinfo?(ryanvm)

(In reply to Sotaro Ikeda [:sotaro] from comment #19)

Video overlay is enabled only on nightly on non-Intel GPU on current m-c. It does not go to beta.

If it's disabled by default for Beta, we're good for now. Can we please make sure this bug is set blocking whatever bug is tracking letting non-Intel GPUs ride the trains?

Flags: needinfo?(ryanvm) → needinfo?(sotaro.ikeda.g)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)

(In reply to Sotaro Ikeda [:sotaro] from comment #19)

Video overlay is enabled only on nightly on non-Intel GPU on current m-c. It does not go to beta.

If it's disabled by default for Beta, we're good for now. Can we please make sure this bug is set blocking whatever bug is tracking letting non-Intel GPUs ride the trains?

Created bug 1769643 for it.

Depends on: 1769643
No longer depends on: 1767212
Flags: needinfo?(sotaro.ikeda.g)

The severity field for this bug is set to S3. However, the bug is marked as tracked for firefox102 (nightly) and tracked for firefox101 (beta).
:bhood, could you consider increasing the severity of this tracked bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Blocks: 1769643
No longer depends on: 1769643
Keywords: crash, hang
Summary: GPU Process hang and crash when browsing Imgur.com in Nightly → GPU Process hang and crash when browsing Imgur.com in Nightly with Radeon
Flags: needinfo?(bhood)
Depends on: 1767212

:Omega, can you check if the problem is addressed on latest nightly? Bug 1767212 might address the problem.

Flags: needinfo?(lh.bennett)

(In reply to Sotaro Ikeda [:sotaro] from comment #23)

:Omega, can you check if the problem is addressed on latest nightly? Bug 1767212 might address the problem.

It's been a driver upgrade and a day trying to recreate the hang/crash. But so far, I have not been able to reproduce the bug.

Flags: needinfo?(lh.bennett)

Set release status flags based on info from the regressing bug 1763280

Leman, does this behavior persist, or would you considered it resolved?

Flags: needinfo?(lh.bennett)

(In reply to Bob Hood from comment #26)

Leman, does this behavior persist, or would you considered it resolved?

I would consider it resolved. I haven't seen the issue come back.

Flags: needinfo?(lh.bennett)
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Assignee: nobody → sotaro.ikeda.g
Target Milestone: --- → 102 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: