Closed Bug 1772908 Opened 3 years ago Closed 3 years ago

Fx shutdown-hangs if updater runs long

Categories

(Toolkit :: Application Update, defect)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
110 Branch
Tracking Status
firefox110 --- fixed

People

(Reporter: handyman, Assigned: rkraesig)

References

Details

(Whiteboard: [win:stability])

Attachments

(6 files, 2 obsolete files)

Bug 1505660 looks at shutdown hangs under the signature [@ shutdownhang | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads ]. Most of them are not related to the updater but a few like this one are. The Crash Annotations tab shows XPCOMSpinEventLoopStack default: nsThread::Shutdown: Update Watcher. Thread 30 is "Update Watcher" and shows us waiting on nsUpdateProcessor::WaitForProcess.

Severity: -- → S3

SyncRunnable's helper functions take an nsIRunnable *; but the most
common way of building nsIRunnables, NS_NewRunnableFunction, returns
an already_AddRefed<nsIRunnable> instead. Add two new overloads of the
helper functions to eliminate the impedance mismatch.

(This does result in an uncomfortable amount of code duplication. While
we could eliminate that with appropriate use of SFINAE, it'll be simpler
if we wait for C++20 and its requires keyword.)

Additionally, add two explicitly-deleted overloads to catch and prevent
a previously-common antipattern that presumably resulted from this type
mismatch: accidentally wrapping the actual runnable in two layers of
SyncRunnable. Fix the former use-sites appropriately. (This was
probably harmless, but is also probably best avoided.)

No functional changes. This is in some sense a continuation of bug
1281626.

Assignee: nobody → rkraesig
Status: NEW → ASSIGNED

Implement a Poller object which will (on some specified thread)
repeatedly check for some condition to be satisfied, and invoke a
continuation once it is.

(This is simple enough, in concept. The vast majority of the complexity
comes from needing to cleanly handle destruction of the the poller
object and/or its target thread while polling is still taking place.)

No functional changes in this commit; Poller will actually be used in
the next one.

Depends on D157131

On Windows and on most flavors of Unix (including Linux, but explicitly
excluding macOS), when we wait for an updater-process to terminate, we
poll for its completion rather than reacting asynchronously to some
OS-provided event. The polling loop is supposed to be async-friendly
and interruptible, in that it enqueues its continuation onto its
thread's event queue, rather than directly and synchronously looping.

Unfortunately, the polling loop doesn't idle between poll attempts.
Instead, it calls sleep() (or the moral equivalent, on Windows)
directly within the posted task, and then enqueues the next continuation
at the default priority before concluding. Since shutdown-time teardown
is effectively an idle-priority task, it will never get to run unless
the updater finishes — so an active updater process will always induce a
shutdown hang.

The right fix for this is almost certainly to set up some listener for
whatever asynchronous mechanism each OS provides for this sort of thing
instead of polling.

The stupid fix — which is what this commit does — is simply to remove
the sleep() from the task itself, and instead use Poller (in effect,
a suitably-wrapped nsITimer) to perform polling.

On other OSes, there are unfortunately no functional changes:

  • On macOS, as of bug 1250901, we use NSTask to launch processes,
    which does not provide a pollable interface: either you block or you
    wait on an asynchronous callback, and we currently do the former.
  • On other OSes (if there are any, anymore), we rely on NSPR, which also
    doesn't provide a pollable interface.

Depends on D157132

Attachment #9294282 - Attachment is obsolete: true
Attachment #9294281 - Attachment is obsolete: true
Attachment #9294280 - Attachment description: Bug 1772908 - [1/3] Drive-by cleanup: simplify use of SyncRunnable r?#xpcom-reviewers → Bug 1772908 - [1/6] Drive-by cleanup: simplify use of SyncRunnable r?#xpcom-reviewers

Remove using namespace JS, which leaks into the other GTests in this
group due to unified builds. (Most uses of items in namespace JS are
already explicitly tagged, anyway.)

Additionally, fix a few minor clang-tidy complaints.

No functional changes.

Depends on D157131

The UpdateWatcher thread might not be shut down if ProcessUpdates
returned an error. Rework StartStagedUpdate so that the thread will be
shut down after any early return.

More cosmetically, rename the updater thread to "Updater Thread": it's
not limited to being a watcher-thread, but is also used for the initial
update checks.

Depends on D159559

Add a few relevant caveats to ObjectWatcher documentation, in the hope
of keeping someone else from following it down to a dead end.

Depends on D159560

Implement a version of ObjectWatcher which does not rely on the Chromium
MessageLoop.

Depends on D159561

While a separate thread is still used to check for the presence of
downloaded updates, there's no need to keep that thread around until the
update completes.

Depends on D159562

Pushed by rkraesig@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7908c5532c62 [1/6] Drive-by cleanup: simplify use of SyncRunnable r=xpcom-reviewers,necko-reviewers,nika,valentin https://hg.mozilla.org/integration/autoland/rev/7247a59febda [2/6] Drive-by cleanup: clean up xpcom/tests/gtest/TestGCPostBarriers.cpp r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/21d98896869f [3/6] Consistently stop UpdateWatcher thread r=application-update-reviewers,bytesized https://hg.mozilla.org/integration/autoland/rev/01a97b1691e9 [4/6] Abjure against casual use of base::ObjectWatcher r=ipc-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/8e25782bd2b9 [5/6] Implement WinHandleWatcher r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/3b84539c5aeb [6/6] Replace polling in UpdateDriver with use of HandleWatcher on Windows r=application-update-reviewers,bytesized
Flags: needinfo?(rkraesig)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 108 Branch → ---

Issue identified; currently working on a fix and further tests.

Flags: needinfo?(rkraesig)
Pushed by rkraesig@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3b9e920f64f8 [5/6] Implement WinHandleWatcher r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/7aaf55dea6ed [6/6] Replace polling in UpdateDriver with use of HandleWatcher on Windows r=application-update-reviewers,bytesized

Note that this is unlikely to ameliorate the actual issue as much as was originally expected.

Since this was originally identified, the BitsCommander thread has significantly overtaken the UpdateWatcher thread in number of hangs: while previously at roughly 1:1, they are now 20:1 in favor of BitsCommander, and this patchset only modifies UpdateWatcher.

Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 110 Branch

(In reply to Ray Kraesig [:rkraesig] from comment #16)

Note that this is unlikely to ameliorate the actual issue as much as was originally expected.

Since this was originally identified, the BitsCommander thread has significantly overtaken the UpdateWatcher thread in number of hangs: while previously at roughly 1:1, they are now 20:1 in favor of BitsCommander, and this patchset only modifies UpdateWatcher.

I assume that is bug 1741675.

See Also: → 1741675
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: