Closed Bug 1879816 Opened 1 year ago Closed 1 year ago

[macOS] Firefox UI becomes unresponsive (freezes) after using the Refresh Firefox option

Categories

(Core :: Widget: Cocoa, defect)

Desktop
macOS
defect

Tracking

()

VERIFIED FIXED
125 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox122 --- unaffected
firefox123 --- unaffected
firefox124 + verified
firefox125 --- verified

People

(Reporter: atrif, Assigned: bradwerth)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Found in

  • 124.0a1 (2024-02-11)

Affected versions

  • 124.0a1 (2024-02-11)

Tested platforms

  • Affected platforms: macOS 12 (Intel), macOS 13.6.4 (Intel), macOS 14.2.1 (M1)
  • Unaffected platforms: Ubuntu 22, Windows 11x64

Preconditions

  • new profile

Steps to reproduce

  1. Open about:support and click on Refresh Firefox.
  2. Finish the process and try to use Firefox Nightly.

Expected result

  • Firefox still functions as expected.

Actual result

  • Firefox can no longer be used until is closed and reopened.

Regression range

  • I tried searching for a manual regression range but the issue is too intermittent sometimes and the results are not accurate. I got different results in different tries. Firefox 124.0a1 (20240130045011) is a good build.

Additional notes

  • Attached a screen recording: link
  • Bug 1811488 can be hit as well. Bug 1811488 can be easily fixed by focusing on desktop and then refocusing Firefox but this is not the case in this issue. This particular issue can only be fixed by closing and reopening Firefox.
  • The issue might be intermittent. Please repeat the steps on new profile or new nightly installation if needed.
  • Setting as S2 for now because I don’t know if this issue can be seen using other methods and I cannot reproduce this with 123.0b9 and 122.0.1. If more information is needed please let me know.

:atrif, could you try to find a regression range using for example mozregression?

See Also: → 1877726
QA Whiteboard: [qa-regression-triage]
See Also: 18114881812622

I can recreate this in the current 124 beta and 125 nightly but not on 123 release.
Here is a changelog for Firefox 124.0a1 (20240130214506), the first build after Firefox 124.0a1 (20240130045011) in which the issue is not reproducible. However, I do not see a possible culprit there

Unfortunately after several attempts with different builds, the issue stopped being reproducible on nightly for me.

:sphol based on your "see also" do you think this was an existing issue?

Flags: needinfo?(spohl.mozilla.bugs)

(In reply to Dianna Smith [:diannaS] from comment #2)

:sphol based on your "see also" do you think this was an existing issue?

I'm not sure. However, it seems similar enough to be worth being called out in the "see also" section.

Flags: needinfo?(spohl.mozilla.bugs)

:eeejay going out on a limb here since I do not have an accurate regression range, but do you think this is something that could be caused by recent mac changes? (for ex. can we rule out bug 1876059)?

Flags: needinfo?(eitan)

Alexandru, thanks for raising this bug. It looks like this is still stuck in identifying the actual regressing bug. Can you try again and focus on the likely regressing candidates above?

It appears we're all a bit stuck here.

Flags: needinfo?(atrif)

(In reply to Frederik Braun [:freddy] from comment #5)

Alexandru, thanks for raising this bug. It looks like this is still stuck in identifying the actual regressing bug. Can you try again and focus on the likely regressing candidates above?

It appears we're all a bit stuck here.

Hello! I tried again and it seems that I found some new steps for this that work on mozregression as well:

  1. Open about:support, click Troubleshoot Mode > Restart > Open
  2. Try to open a new tab.
  3. Focus on the desktop and try to open a new tab (In this step I am making sure we don't hit Bug 1811488)
    AR: Firefox does not respond to clicks

Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=945fe9361ecaa5839d5753fa0c23b1979c975221&tochange=9e9da481d3d884b86803b271cb3fde3e7bbbed2f
Possible regressor: bug 1865372

I did the regression range twice on macOS 13.6.3 ARM and got the same result. I hope this helps. I don't know if this is the regressor, please let me know if another regression range is needed.

Flags: needinfo?(atrif)

That new regressor looks very plausible. Brad or Markus, can you take a look? This just moved to beta with 124.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(eitan)
Flags: needinfo?(bwerth)
Regressed by: 1865372

The bug is marked as tracked for firefox124 (beta). However, the bug still isn't assigned.

:gcp, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(gpascutto)
Flags: needinfo?(gpascutto)

Yes, the regressor is Bug 1865372. I can reproduce with a slightly simpler version of the Steps to Reproduce from comment 6:

  1. about:support -> Troubleshoot Mode -> Restart -> Open
  2. Try to click the new tab "+" button in the window titlebar

The problem is evidently that the newly-created window can't be made frontmost, or focusable, or something like that. I'll figure it out and the fix should be an easy uplift.

Assignee: nobody → bwerth
Flags: needinfo?(mstange.moz)
Flags: needinfo?(bwerth)

Couldn't find this back just now so updating the title a bit to make it easier to find. :-)

Summary: [macOS] Firefox UI becomes unresponsive after using the Refresh Firefox option → [macOS] Firefox UI becomes unresponsive (freezes) after using the Refresh Firefox option

This is weird. The effect of the patch in Bug 1865372 is to delete the native window "Open Nightly in Troubleshoot Mode?" before the new top-level window is created, rather than after. It's not clear to me how this makes a difference in the way the new top-level window behaves. I can't find any difference in the window style mask or collection behavior at creation time. And those styles don't seem to be modified by anything else post-creation, either.

So, possible fixes:

  1. Keep looking. Keep trying to figure out how the style and existence of the "Open Nightly in Troubleshoot Mode?" native window affects the creation of new native windows. It's a bad sign that there is an unexpected side effect.
  2. Back out Bug 1865372 and the follow-up Bug 1877749.
  3. Change the nsCocoaWindow::Destroy() behavior to instead just hide the native window -- don't destroy it -- and wait until GC to destroy the native window. This is a kind of a reversion of Bug 1865372, without actually backing out the whole code change, and ensuring that the issue addressed by that Bug remains closed.

Fix 2 is a reasonable choice, but reopens the issue in Bug 1865372. Fix 3 would work, but it doesn't feel great to have this unknown side effect and to rely on GC timing to ensure correct behavior. But of course that was the state of things before Bug 1865372 landed, so pursuing Fix 2 and backing out Bug 1865372 doesn't feel any better.

I'll build a patch that does Fix 3.

This allows mWindow to stay alive as long as there are still references
to the nsCocoaWindow that owns it. This was the original intent before
the landing of Bug 1865372, which instead destroyed mWindow as soon as
possible. The important behavior in Destroy() is that the native window
becomes not visible. There are at least two ways to hide the window:
orderOut, which merely hides the window, and close (used here) which has
some additional semantic meaning and posts NSWindowWillCloseNotification.
That notification is detected by our window state save/restore code, so
closing the window is a trigger to update the window state. It's
appropriate to update this state during Destroy().

The alternative of hiding the window with orderOut also seems to confuse
the window manager when closing a fullscreen window (which was the
issue addressed by Bug 1865372). So we use close. In order to make this
work, we have to ensure that all new native windows have the property
releaseWhenClosed set to NO, and that we explicitly release mWindow in
DestroyNativeWindow().

Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0fcec29405ad Make nsCocoaWindow::Destroy close the window and let destruction happen later. r=mstange
Attachment #9381884 - Attachment is obsolete: true

This caused multiple failures(the rest of them appeared later). To see a full list of this failures please take a look at this push - the link has a filter that shows all the failures fix by the backout from Comment 14.
All failures happened only on OS X 10.15

It is important that calls to this function have no early exits and
process all the side effects. The side effects of this function affect
the behavior of other/future windows. So whether or not this particular
native window has already been destroyed can't affect whether or not the
app is in a modal state.

See Also: → 1881597

Brad and I figured out what the actual bug was.

The window was unresponsive because this check in sendEvent: was dropping all events on the floor, because we were thinking that there was a modal window shown. But the modal window wasn't shown anymore; our state was confused. Our state got confused because after entering the modal state with SetModal(true), the follow-up call to SetModal(false) wasn't exiting the modal state. And that was because the SetModal(false) was now running after mWindow had been nulled out, and was hitting an early return.

We need to ensure that balanced calls to SetModal result in balanced changes to the modal state.

Attachment #9382127 - Attachment description: Bug 1879816: Ensure that nsCocoaWindow::SetModal calls always succeed. → Bug 1879816: Ensure that balanced calls to nsCocoaWindow::SetModal result in balanced modal state.
Pushed by bwerth@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7d10849b0c31 Ensure that balanced calls to nsCocoaWindow::SetModal result in balanced modal state. r=mstange
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch

Comment on attachment 9382127 [details]
Bug 1879816: Ensure that balanced calls to nsCocoaWindow::SetModal result in balanced modal state.

Beta/Release Uplift Approval Request

  • User impact if declined: Firefox Refresh will generate an unclickable/unfocusable window.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: As Steps to Reproduce in Bug comment 0. Using macOS, about:support > Troubleshoot Mode -> Restart -> Open, then try to click on the toolbar to open a new tab.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Minor code fix to a longstanding timing vulnerability which was not detected until triggered by other recent changes. Objectively more correct behavior. macOS only.
  • String changes made/needed:
  • Is Android affected?: No
Flags: needinfo?(bwerth)
Attachment #9382127 - Flags: approval-mozilla-beta?
Flags: qe-verify+
QA Whiteboard: [qa-regression-triage] → [qa-regression-triage] [qa-triaged]

Verified fixed with Firefox 125.0a1 (20240223095502) on macOS 12, 13 ARM, and 14 ARM. Firefox can be used as expected after following the steps from comment 20 and comment 0. Note that bug 1812622 is still present intermittently but can be resolved by focusing the desktop and focusing Firefox again.

Comment on attachment 9382127 [details]
Bug 1879816: Ensure that balanced calls to nsCocoaWindow::SetModal result in balanced modal state.

Approved for 124.0b4

Attachment #9382127 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Verified fixed with 124.0b4 (20240223213912) treeherder build from comment 23 on macOS 12 Intel, 13 ARM and 14 ARM. Firefox can be used as expected after following the steps from comment 20 and comment 0. Note that bug 1812622 is still present intermittently but can be resolved by focusing the desktop and focusing Firefox again.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-regression-triage] [qa-triaged]
Flags: qe-verify+
See Also: → 1885010
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: