Fx shutdown-hangs if updater runs long
Categories
(Toolkit :: Application Update, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox110 | --- | fixed |
People
(Reporter: handyman, Assigned: rkraesig)
References
Details
(Whiteboard: [win:stability])
Attachments
(6 files, 2 obsolete files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Bug 1505660 looks at shutdown hangs under the signature [@ shutdownhang | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads ]
. Most of them are not related to the updater but a few like this one are. The Crash Annotations tab shows XPCOMSpinEventLoopStack default: nsThread::Shutdown: Update Watcher
. Thread 30 is "Update Watcher" and shows us waiting on nsUpdateProcessor::WaitForProcess
.
Updated•3 years ago
|
Comment 1•3 years ago
|
||
FWIW, some newer crashes:
https://crash-stats.mozilla.org/report/index/02844c82-edc5-4636-8b8a-a3bd30220819
https://crash-stats.mozilla.org/report/index/42c0bf4a-f2b9-428e-8b6e-e9f7e0220821
https://crash-stats.mozilla.org/report/index/95877eb6-74d7-4221-942d-b624a0220820
https://crash-stats.mozilla.org/report/index/a7bd99a5-0d66-468e-8ba9-3111e0220821
Assignee | ||
Comment 2•3 years ago
|
||
SyncRunnable
's helper functions take an nsIRunnable *
; but the most
common way of building nsIRunnables, NS_NewRunnableFunction
, returns
an already_AddRefed<nsIRunnable>
instead. Add two new overloads of the
helper functions to eliminate the impedance mismatch.
(This does result in an uncomfortable amount of code duplication. While
we could eliminate that with appropriate use of SFINAE, it'll be simpler
if we wait for C++20 and its requires
keyword.)
Additionally, add two explicitly-deleted overloads to catch and prevent
a previously-common antipattern that presumably resulted from this type
mismatch: accidentally wrapping the actual runnable in two layers of
SyncRunnable
. Fix the former use-sites appropriately. (This was
probably harmless, but is also probably best avoided.)
No functional changes. This is in some sense a continuation of bug
1281626.
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
Implement a Poller object which will (on some specified thread)
repeatedly check for some condition to be satisfied, and invoke a
continuation once it is.
(This is simple enough, in concept. The vast majority of the complexity
comes from needing to cleanly handle destruction of the the poller
object and/or its target thread while polling is still taking place.)
No functional changes in this commit; Poller
will actually be used in
the next one.
Depends on D157131
Assignee | ||
Comment 4•3 years ago
|
||
On Windows and on most flavors of Unix (including Linux, but explicitly
excluding macOS), when we wait for an updater-process to terminate, we
poll for its completion rather than reacting asynchronously to some
OS-provided event. The polling loop is supposed to be async-friendly
and interruptible, in that it enqueues its continuation onto its
thread's event queue, rather than directly and synchronously looping.
Unfortunately, the polling loop doesn't idle between poll attempts.
Instead, it calls sleep()
(or the moral equivalent, on Windows)
directly within the posted task, and then enqueues the next continuation
at the default priority before concluding. Since shutdown-time teardown
is effectively an idle-priority task, it will never get to run unless
the updater finishes — so an active updater process will always induce a
shutdown hang.
The right fix for this is almost certainly to set up some listener for
whatever asynchronous mechanism each OS provides for this sort of thing
instead of polling.
The stupid fix — which is what this commit does — is simply to remove
the sleep()
from the task itself, and instead use Poller
(in effect,
a suitably-wrapped nsITimer
) to perform polling.
On other OSes, there are unfortunately no functional changes:
- On macOS, as of bug 1250901, we use
NSTask
to launch processes,
which does not provide a pollable interface: either you block or you
wait on an asynchronous callback, and we currently do the former. - On other OSes (if there are any, anymore), we rely on NSPR, which also
doesn't provide a pollable interface.
Depends on D157132
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 5•3 years ago
|
||
Remove using namespace JS
, which leaks into the other GTests in this
group due to unified builds. (Most uses of items in namespace JS
are
already explicitly tagged, anyway.)
Additionally, fix a few minor clang-tidy complaints.
No functional changes.
Depends on D157131
Assignee | ||
Comment 6•3 years ago
|
||
The UpdateWatcher thread might not be shut down if ProcessUpdates
returned an error. Rework StartStagedUpdate
so that the thread will be
shut down after any early return.
More cosmetically, rename the updater thread to "Updater Thread": it's
not limited to being a watcher-thread, but is also used for the initial
update checks.
Depends on D159559
Assignee | ||
Comment 7•3 years ago
|
||
Add a few relevant caveats to ObjectWatcher documentation, in the hope
of keeping someone else from following it down to a dead end.
Depends on D159560
Assignee | ||
Comment 8•3 years ago
|
||
Implement a version of ObjectWatcher which does not rely on the Chromium
MessageLoop.
Depends on D159561
Assignee | ||
Comment 9•3 years ago
|
||
While a separate thread is still used to check for the presence of
downloaded updates, there's no need to keep that thread around until the
update completes.
Depends on D159562
Comment 10•3 years ago
|
||
Comment 11•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7908c5532c62
https://hg.mozilla.org/mozilla-central/rev/7247a59febda
https://hg.mozilla.org/mozilla-central/rev/21d98896869f
https://hg.mozilla.org/mozilla-central/rev/01a97b1691e9
https://hg.mozilla.org/mozilla-central/rev/8e25782bd2b9
https://hg.mozilla.org/mozilla-central/rev/3b84539c5aeb
Comment 12•3 years ago
|
||
Backed out 2 changesets for causing Bug 1799741.
Backout link: https://hg.mozilla.org/integration/autoland/rev/4d2801e27bb6611eb477d304102e8d1bc1e335a0
Updated•3 years ago
|
Comment 13•3 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/4d2801e27bb6
Assignee | ||
Comment 14•3 years ago
|
||
Issue identified; currently working on a fix and further tests.
Comment 15•3 years ago
|
||
Assignee | ||
Comment 16•3 years ago
|
||
Note that this is unlikely to ameliorate the actual issue as much as was originally expected.
Since this was originally identified, the BitsCommander
thread has significantly overtaken the UpdateWatcher
thread in number of hangs: while previously at roughly 1:1, they are now 20:1 in favor of BitsCommander
, and this patchset only modifies UpdateWatcher
.
Comment 17•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3b9e920f64f8
https://hg.mozilla.org/mozilla-central/rev/7aaf55dea6ed
Comment 18•3 years ago
|
||
(In reply to Ray Kraesig [:rkraesig] from comment #16)
Note that this is unlikely to ameliorate the actual issue as much as was originally expected.
Since this was originally identified, the
BitsCommander
thread has significantly overtaken theUpdateWatcher
thread in number of hangs: while previously at roughly 1:1, they are now 20:1 in favor ofBitsCommander
, and this patchset only modifiesUpdateWatcher
.
I assume that is bug 1741675.
Description
•