Closed Bug 1707598 Opened 3 years ago Closed 3 years ago

Intermittent shutdown timeouts in CI with native menus

Tracking

()

Status:

RESOLVED FIXED

Milestone:

90 Branch

Tracking Flags:

Tracking

Status

firefox89

---

fixed

firefox90

---

fixed

People

(Reporter: mstange, Assigned: mstange)

References

Details

(Whiteboard: [proton-context-menus][mac:mr1] [proton-uplift])

Attachments

(7 files)

Bug 1707598 - When opening a new menu just after closing an old menu, make sure we exit the nested event loop for the old NSMenu before we ask the new NSMenu to open. r=harry 3 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1707598 - Release MOZMenuOpeningCoordinator on shutdown. r=harry 3 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1707598 - Make DoCommandRunnable a regular runnable, not a cancelable runnable. r=harry 3 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1707598 - Add -[MOZMenuOpeningCoordinator runAfterMenuClosed:]. r=harry 3 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1707598 - Make sure that command events from activateItem only run once the nested event loop for the NSMenu has been exited. r=harry 3 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1707598 - Rename ActivateItemAndClose and call cancelTrackingWithoutAnimation in NativeMenuMac::ActivateItem. r=harry 3 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1707598 - Let MenuClosed know when it's called after ActivateItem. r=harry 3 years ago Markus Stange [:mstange] 48 bytes, text/x-phabricator-request		Details \| Review

Markus Stange [:mstange]

Assignee

Description

•

3 years ago

Now that native context menus are enabled on macOS in Nightly (bug 1700679), we are seeing an increase in jobs that are terminated by [@ mozilla::(anonymous namespace)::RunWatchdog(void*)] after a shutdown timeout, such as this one.
These occurrences have NativeMenuMac::OpenMenu on the stack. This indicates that the nested event loop for a native menu is never exited.
(Bug 1358898 is the general bug about shutdown timeouts in CI.)

I debugged this a little by pushing logging instrumentation to try, and it seems that this happens if we open another native menu before the previous native menu was able to exit its nested event loop.

Jens Stutte [:jstutte]

Comment 1

•

3 years ago

Could this be related to bug 1705365 ?

Markus Stange [:mstange]

Assignee

Comment 2

•

3 years ago

(In reply to Jens Stutte [:jstutte] from comment #1)

Could this be related to bug 1705365 ?

I don't think so - that bug was filed before native context menus were enabled, and it doesn't have NativeMenuMac::OpenMenu on the stack.

Julien Chaupitre

Updated

•

3 years ago

See Also: → https://jira.mozilla.com/browse/FIDEFE-1425

Markus Stange [:mstange]

Assignee

Comment 3

•

3 years ago

Attached file Bug 1707598 - When opening a new menu just after closing an old menu, make sure we exit the nested event loop for the old NSMenu before we ask the new NSMenu to open. r=harry — Details

On CI, where we open and close menu items in quick succession, we sometimes got
into a state where the new menu was opened while we were still in the old menu's
nested event loop. So we had the following sequence of events:

 - old menu +[NSMenu popUpContextMenu:withEvent:forView:]
   - nested event loop for old menu
     - old menu -[NSMenu cancelTrackingWithoutAnimation]
     - new menu +[NSMenu popUpContextMenu:withEvent:forView:]
       - nested event loop for new menu
         - new menu -[NSMenu cancelTrackingWithoutAnimation]
     - new menu's event loop is exited, but old menu's event loop remains on the
       stack
     - shutdown hang here

MOZMenuOpeningCoordinator makes sure that +[NSMenu popUpContextMenu:withEvent:forView:]
is always called in sequence, never in a nested fashion.

Pulsebot

Comment 4

•

3 years ago

Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/9577e2feecd1
When opening a new menu just after closing an old menu, make sure we exit the nested event loop for the old NSMenu before we ask the new NSMenu to open. r=harry

Cosmin Sabou [:CosminS]

Comment 5

•

3 years ago

Backed out for causing bc failures on browser_library_bookmark_pages.js.

Push with failures: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel&revision=9577e2feecd1fe561f07cc847f14ec83c7c81435&selectedTaskRun=EEzcXTwBTjK9P1SCvYj2eg.0

Failure log: https://treeherder.mozilla.org/logviewer?job_id=337785220&repo=autoland

Backout link: https://hg.mozilla.org/integration/autoland/rev/b1bc688b1805f1755c61532c64d7ec3f7d9fec9e

Flags: needinfo?(mstange.moz)

Romain Testard [:RT]

Updated

•

3 years ago

Priority: -- → P1

Markus Stange [:mstange]

Assignee

Comment 6

•

3 years ago

I cannot reproduce the browser_library_bookmark_pages.js timeout locally. I even tried running the entire directory.
Try push with debugging instrumentation: https://treeherder.mozilla.org/jobs?repo=try&revision=975e0a40dfd4576d129d5abe53539c052e169663

Markus Stange [:mstange]

Assignee

Comment 7

•

3 years ago

•

Edited

Something very strange is happening in that test. Profile: https://share.firefox.dev/3sZhuxZ

The first menu opens. While it is open, i.e. while we're in the tracking event loop for the first menu, we call activateItem. activateItem posts a runnable for the menuitem command, and then calls -[NSMenu cancelTrackingWithoutAnimation]. Then the runnable runs before we we had a chance to exit the nested event loop for the menu. (This is the part that happens on try but not on my machine - on my machine, we exit the nested event loop before we run the runnable.) During the command event, we open a sheet window and spawn another nested event loop, under AppWindow::ShowModal. So now we have two nested event loops on the stack: the event loop for the first menu (which already has had cancelTrackingWithoutAnimation called on it) and inside of that the modal event loop for the sheet window. Then, at some point, we close the sheet window and exit the modal event loop. At this point, you'd expect that we'd leave the nested event loop for the first menu, due to the cancelTrackingWithoutAnimation call earlier! But we don't. The menu's event loop stays on the stack. I'm not sure why. We now try to open the second menu. However, MOZMenuOpeningCoordinator delays this opening because the first menu's event loop is still on the stack. And then we go idle, stuck in the nested event loop of the first menu, and the second menu never opens.
It seems that our request to close the first menu somehow got lost. Maybe cancelTrackingWithoutAnimation doesn't expect us to launch another nested event loop before returning back into the menu's event loop?

I'll try to make it so that the command event from activateItem only runs after we exit the menu's event loop. That should avoid this bug.

Flags: needinfo?(mstange.moz)

Markus Stange [:mstange]

•

3 years ago

https://treeherder.mozilla.org/jobs?repo=try&tier=1&revision=e7b9aae55b9d5d868249e44e75048cf318203e22

Markus Stange [:mstange]

Assignee

Comment 14

•

3 years ago

•

Edited

It looks like these patches now cause another test failure, in 4 out of 5 runs, in bc7 browser/components/extensions/test/browser/browser_ext_browserAction_contextMenu.js. I'll debug that one now. It doesn't reproduce locally for me.

Markus Stange [:mstange]

Assignee

Comment 15

•

3 years ago

Attached file Bug 1707598 - Rename ActivateItemAndClose and call cancelTrackingWithoutAnimation in NativeMenuMac::ActivateItem. r=harry — Details

This also makes sure we call ActivateItemAfterMenuClosing on the submenu that
immediately contains the activated menu item, and not on the root menu.
This usually doesn't make a difference, except in the case where something
calls FlushMenuClosedRunnable afterwards; in that case, before this patch, we
might accidentally fired the popuphidden event for the submenu before the
command event for the clicked item.

Depends on D113733

Markus Stange [:mstange]

Assignee

Comment 16

•

3 years ago

Attached file Bug 1707598 - Let MenuClosed know when it's called after ActivateItem. r=harry — Details

In the next patch, it will make different decisions for how to dispatch the
MenuClosedAsync event based on this flag.

Depends on D113987

Markus Stange [:mstange]

Assignee

Comment 17

•

3 years ago

•

Edited

(In reply to Markus Stange [:mstange] from comment #14)

It looks like these patches now cause another test failure, in 4 out of 5 runs, in bc7 browser/components/extensions/test/browser/browser_ext_browserAction_contextMenu.js. I'll debug that one now. It doesn't reproduce locally for me.

This was happening because of changed microtask ordering; the test was expecting telemetry events after a popuphidden event, and one of those telemetry events was inserted from a microtask after the command event. I've addressed this by making -[MOZMenuOpeningCoordinator _runMenu] dispatch the "after menu closed runnables" to the regular event loop instead of calling ->Run() on them directly; this way, the Gecko event loop will make sure there are appropriate microtask checkpoints between them.

There was another test failure, in devtools/client/framework/test/browser_menu_api.js - this one was caused by a missing / delayed popuphidden event for a submenu which was closed with openMenu(false); the popuphidden event was delayed until after the entire menu closed and exited its event loop, but obviously that's too late if we're only closing a submenu and leaving the rest of the menu open. I've addressed this bug with the newly-added patches which let MenuClosed() know whether the entire menu is closing, so that it can pick the appropriate way to dispatch the MenuClosedAsync runnable.

Markus Stange [:mstange]

Assignee

Comment 18

•

3 years ago

https://treeherder.mozilla.org/jobs?repo=try&tier=1&revision=1e23f8a630b06ae8e12cb1eb904498482a3b0e45

Pulsebot

Comment 19

•

3 years ago

Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/8f867acbaa94
When opening a new menu just after closing an old menu, make sure we exit the nested event loop for the old NSMenu before we ask the new NSMenu to open. r=harry
https://hg.mozilla.org/integration/autoland/rev/50adb1fd03f3
Release MOZMenuOpeningCoordinator on shutdown. r=harry
https://hg.mozilla.org/integration/autoland/rev/be5032506244
Make DoCommandRunnable a regular runnable, not a cancelable runnable. r=harry
https://hg.mozilla.org/integration/autoland/rev/d0c2e0e4b6a5
Add -[MOZMenuOpeningCoordinator runAfterMenuClosed:]. r=harry
https://hg.mozilla.org/integration/autoland/rev/5df0f67ca1c6
Rename ActivateItemAndClose and call cancelTrackingWithoutAnimation in NativeMenuMac::ActivateItem. r=harry
https://hg.mozilla.org/integration/autoland/rev/058fd48682c5
Let MenuClosed know when it's called after ActivateItem. r=harry
https://hg.mozilla.org/integration/autoland/rev/2cd944025913
Make sure that command events from activateItem only run once the nested event loop for the NSMenu has been exited. r=harry

Cristian Brindusan [:cbrindusan]

Comment 20

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8f867acbaa94
https://hg.mozilla.org/mozilla-central/rev/50adb1fd03f3
https://hg.mozilla.org/mozilla-central/rev/be5032506244
https://hg.mozilla.org/mozilla-central/rev/d0c2e0e4b6a5
https://hg.mozilla.org/mozilla-central/rev/5df0f67ca1c6
https://hg.mozilla.org/mozilla-central/rev/058fd48682c5
https://hg.mozilla.org/mozilla-central/rev/2cd944025913

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox90: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 90 Branch

Pascal Chevrel:pascalc (PTO until April 26)

Comment 21

•

3 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/81480fec4ca9
https://hg.mozilla.org/releases/mozilla-beta/rev/2b85fcf9113a
https://hg.mozilla.org/releases/mozilla-beta/rev/5e4b0e56d9ba
https://hg.mozilla.org/releases/mozilla-beta/rev/365aabdbb867
https://hg.mozilla.org/releases/mozilla-beta/rev/f146ec71d0a2
https://hg.mozilla.org/releases/mozilla-beta/rev/d0677098dc5e
https://hg.mozilla.org/releases/mozilla-beta/rev/643b73cca9fd

status-firefox89: --- → fixed

Pascal Chevrel:pascalc (PTO until April 26)

Updated

•

3 years ago

Whiteboard: [proton-context-menus][mac:mr1] → [proton-context-menus][mac:mr1] [proton-uplift]

You need to log in before you can comment on or make changes to this bug.