Closed Bug 1792115 Opened 2 years ago Closed 2 years ago

Clicking Trigger Device Reset in about:support causes Firefox to lock up (Windows)

Categories

(Core :: Graphics, defect)

Firefox 106
All
Windows
defect

Tracking

()

VERIFIED FIXED
107 Branch
Tracking Status
relnote-firefox --- 106+
firefox-esr102 --- unaffected
firefox105 --- unaffected
firefox106 + verified
firefox107 + verified

People

(Reporter: ahale, Assigned: aosmond)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Repro steps (Windows only):

  1. Go to about:support
  2. Click Trigger Device Reset
  3. Firefox gets stuck on a mutex in DeviceManagerDx:CreateCompositorDevices.
Summary: Clicking Trigger Device Reset in about:support causes Firefox to lock up → Clicking Trigger Device Reset in about:support causes Firefox to lock up (Windows)
Regressed by: 1789309
Flags: needinfo?(aosmond)

Set release status flags based on info from the regressing bug 1789309

Verified with mozregression...

5:58.56 INFO: Narrowed integration regression window from [99d74587, dada2510] (3 builds) to [b702eea8, dada2510] (2 builds) (~1 steps left)
5:58.56 INFO: No more integration revisions, bisection finished.
5:58.56 INFO: Last good revision: b702eea846c970a343da94b5461596e22e75426e
5:58.56 INFO: First bad revision: dada2510963e85ddb4e02d94257f5f6c4e6b577e
5:58.56 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=b702eea846c970a343da94b5461596e22e75426e&tochange=dada2510963e85ddb4e02d94257f5f6c4e6b577e

Tried reproducing the device loss using ctrl-shift-win-b (GPU reset shortcut on WIndows) and got the following results:

  • Dell XPS 15 9510 laptop (Intel+NVIDIA): System hardlocked
  • Custom built gaming desktop (Ryzen 7 5800X, Radeon RX 6900 XT): Windows froze for a second and then all open windows flickered once, Firefox Nightly recovered just fine.

So this may be limited to a mutex lock issue when clicking Trigger Device Reset in about:support.

Severity: -- → S2

Bob, this is an S2 regression in 106, is that going to impact our user base on release? If it is the case, can we have this bug investigated and assigned rapidly? Thanks

Flags: needinfo?(bhood)

Andrew, you are the author of the regressing change (see comment 2). Can we get this corrected before Fx106 goes out the door on 18 Oct?

Flags: needinfo?(bhood)

Tracking as this is set to S2 and we lack context for the fix wrt to timing within our release schedule.

The bug is marked as tracked for firefox106 (beta). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Assignee: nobody → aosmond
Flags: needinfo?(bhood)

A user reported a much more common trigger mechanism for this app freeze in https://bugzilla.mozilla.org/show_bug.cgi?id=1793964 and I'm concerned we may want to try to fix this for 106.

It is a shame we don't have tsan on Windows. It is either we are locking two different locks in two different orders, or we try to relock the same lock twice. Both require a code audit to figure out I guess.

Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7476a9c03c57 Avoid a double lock in DeviceManagerDx::MaybeResetAndReacquireDevices. r=gfx-reviewers,bradwerth

At this point of the cycle, it's unlikely we can get this fixed in 106 that ships in a few days as we have already built and QAed our release candidate and have no driver for a RC2 but I can include that in our planned 106 dot release that ships November 1.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 107 Branch
Flags: qe-verify+

I managed to reproduce this issue on a 2022-09-22 Nightly build on Windows 10 using the STR from the Description. Verified as fixed on NIghtly 108.0a1(build ID: 20221020215126) on Windows 10.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
See Also: → 1798099

Comment on attachment 9298263 [details]
Bug 1792115 - Avoid a double lock in DeviceManagerDx::MaybeResetAndReacquireDevices.

Beta/Release Uplift Approval Request

  • User impact if declined: May crash or freeze the parent process if the GPU process crashes or hits a device reset
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The change is trivial/surgical, been tested in nightly/beta for a while now, and will, with as much certainty as I am capable of, make things unarguably better.
  • String changes made/needed:
  • Is Android affected?: Unknown
Attachment #9298263 - Flags: approval-mozilla-release?

Given accumulating evidence from bug 1798099, I would also state that this is sufficient for a dot release driver.

(In reply to Andrew Osmond [:aosmond] (he/him) from comment #18)

Given accumulating evidence from bug 1798099, I would also state that this is sufficient for a dot release driver.

My justification for a dot release would be based on our telemetry:
https://firefoxgraphics.github.io/telemetry/#view=tdrs

While only 0.02% of user sessions experience device resets, those who do experience them hit them a lot (17.4 per session). Those users are much more likely to hit this and experience serious disruptions.

Comment on attachment 9298263 [details]
Bug 1792115 - Avoid a double lock in DeviceManagerDx::MaybeResetAndReacquireDevices.

Approved for 106.0.4.

Attachment #9298263 - Flags: approval-mozilla-release? → approval-mozilla-release+

Hello,
Would there be a way to verify this fix on release on Windows 10? There's no "Trigger Device Reset" button featured in about:support, is there a procedure to properly tweak any pref in order to make the button appear? I tried the steps from Comment 3 but it doesn't reproduce for me.
Thank you.

Flags: needinfo?(aosmond)
Flags: needinfo?(oana.ardelean)

Andrew, could QA trigger this using the device reset button in about:support as well?

Flags: needinfo?(aosmond)

They could if it was available, but it is only available on nightly and dev edition builds:
https://searchfox.org/mozilla-central/rev/49011d374b626d5f0e7dc751a8a57365878e65f1/toolkit/content/aboutSupport.js#582

Flags: needinfo?(aosmond)

As an alternative, I believe you can go to about:support, open the Web Developer tools, go to Console, and type windowUtils.triggerDeviceReset() / hit enter to do it manually. The code is still present in the release builds, just not the button.

It worked with the STR in Comment 26. I could reproduce the hang on 106.0.3. Does not reproduce anymore in 106.0.4, seems to be working just fine. Thank you so much for the information, Andrew, really appreciate your help! Confirmed as fixed on Firefox 106.0.4(build ID: 20221102214123) on Windows 10.

Flags: needinfo?(oana.ardelean)
Blocks: 1798099
Blocks: 1799663
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: