Closed Bug 1772908 Opened 3 years ago Closed 3 years ago

Fx shutdown-hangs if updater runs long

Tracking

()

Status:

RESOLVED FIXED

Milestone:

110 Branch

Tracking Flags:

Tracking

Status

firefox110

---

fixed

People

(Reporter: handyman, Assigned: rkraesig)

References

Details

(Whiteboard: [win:stability])

Attachments

(6 files, 2 obsolete files)

Bug 1772908 - [1/6] Drive-by cleanup: simplify use of SyncRunnable r?#xpcom-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772908 - [2/3] Implement `Poller` r?#xpcom-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772908 - [3/3] Use `Poller` to poll for updater process termination r?#application-update-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772908 - [2/6] Drive-by cleanup: clean up xpcom/tests/gtest/TestGCPostBarriers.cpp r?#xpcom-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772908 - [3/6] Consistently stop UpdateWatcher thread r?#application-update-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772908 - [4/6] Abjure against casual use of base::ObjectWatcher r?#ipc-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772908 - [5/6] Implement WinHandleWatcher r?#xpcom-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1772908 - [6/6] Replace polling in UpdateDriver with use of HandleWatcher on Windows r?#application-update-reviewers 3 years ago Ray Kraesig [:rkraesig] 48 bytes, text/x-phabricator-request		Details \| Review

David Parks [:handyman]

Reporter

Description

•

3 years ago

Bug 1505660 looks at shutdown hangs under the signature [@ shutdownhang | nsThread::Shutdown | nsThreadManager::ShutdownNonMainThreads ]. Most of them are not related to the updater but a few like this one are. The Crash Annotations tab shows XPCOMSpinEventLoopStack default: nsThread::Shutdown: Update Watcher. Thread 30 is "Update Watcher" and shows us waiting on nsUpdateProcessor::WaitForProcess.

Robin Steuber (she/her) [:bytesized]

Updated

•

3 years ago

Severity: -- → S3

Jens Stutte [:jstutte]

Comment 1

•

3 years ago

FWIW, some newer crashes:

https://crash-stats.mozilla.org/report/index/02844c82-edc5-4636-8b8a-a3bd30220819
https://crash-stats.mozilla.org/report/index/42c0bf4a-f2b9-428e-8b6e-e9f7e0220821
https://crash-stats.mozilla.org/report/index/95877eb6-74d7-4221-942d-b624a0220820
https://crash-stats.mozilla.org/report/index/a7bd99a5-0d66-468e-8ba9-3111e0220821

Ray Kraesig [:rkraesig]

Assignee

Comment 2

•

3 years ago

Attached file Bug 1772908 - [1/6] Drive-by cleanup: simplify use of SyncRunnable r?#xpcom-reviewers — Details

SyncRunnable's helper functions take an nsIRunnable *; but the most
common way of building nsIRunnables, NS_NewRunnableFunction, returns
an already_AddRefed<nsIRunnable> instead. Add two new overloads of the
helper functions to eliminate the impedance mismatch.

(This does result in an uncomfortable amount of code duplication. While
we could eliminate that with appropriate use of SFINAE, it'll be simpler
if we wait for C++20 and its requires keyword.)

Additionally, add two explicitly-deleted overloads to catch and prevent
a previously-common antipattern that presumably resulted from this type
mismatch: accidentally wrapping the actual runnable in two layers of
SyncRunnable. Fix the former use-sites appropriately. (This was
probably harmless, but is also probably best avoided.)

No functional changes. This is in some sense a continuation of bug
1281626.

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → rkraesig

Status: NEW → ASSIGNED

Ray Kraesig [:rkraesig]

Assignee

Comment 3

•

3 years ago

Attached file Bug 1772908 - [2/3] Implement `Poller` r?#xpcom-reviewers (obsolete) — Details

Implement a Poller object which will (on some specified thread)
repeatedly check for some condition to be satisfied, and invoke a
continuation once it is.

(This is simple enough, in concept. The vast majority of the complexity
comes from needing to cleanly handle destruction of the the poller
object and/or its target thread while polling is still taking place.)

No functional changes in this commit; Poller will actually be used in
the next one.

Depends on D157131

Ray Kraesig [:rkraesig]

Assignee

Comment 4

•

3 years ago

Attached file Bug 1772908 - [3/3] Use `Poller` to poll for updater process termination r?#application-update-reviewers (obsolete) — Details

On Windows and on most flavors of Unix (including Linux, but explicitly
excluding macOS), when we wait for an updater-process to terminate, we
poll for its completion rather than reacting asynchronously to some
OS-provided event. The polling loop is supposed to be async-friendly
and interruptible, in that it enqueues its continuation onto its
thread's event queue, rather than directly and synchronously looping.

Unfortunately, the polling loop doesn't idle between poll attempts.
Instead, it calls sleep() (or the moral equivalent, on Windows)
directly within the posted task, and then enqueues the next continuation
at the default priority before concluding. Since shutdown-time teardown
is effectively an idle-priority task, it will never get to run unless
the updater finishes — so an active updater process will always induce a
shutdown hang.

The right fix for this is almost certainly to set up some listener for
whatever asynchronous mechanism each OS provides for this sort of thing
instead of polling.

The stupid fix — which is what this commit does — is simply to remove
the sleep() from the task itself, and instead use Poller (in effect,
a suitably-wrapped nsITimer) to perform polling.

On other OSes, there are unfortunately no functional changes:

On macOS, as of bug 1250901, we use NSTask to launch processes,
which does not provide a pollable interface: either you block or you
wait on an asynchronous callback, and we currently do the former.
On other OSes (if there are any, anymore), we rely on NSPR, which also
doesn't provide a pollable interface.

Depends on D157132

Phabricator Automation

Updated

•

3 years ago

Attachment #9294282 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9294281 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9294280 - Attachment description: Bug 1772908 - [1/3] Drive-by cleanup: simplify use of SyncRunnable r?#xpcom-reviewers → Bug 1772908 - [1/6] Drive-by cleanup: simplify use of SyncRunnable r?#xpcom-reviewers

Ray Kraesig [:rkraesig]

Assignee

Comment 5

•

3 years ago

Attached file Bug 1772908 - [2/6] Drive-by cleanup: clean up xpcom/tests/gtest/TestGCPostBarriers.cpp r?#xpcom-reviewers — Details

Remove using namespace JS, which leaks into the other GTests in this
group due to unified builds. (Most uses of items in namespace JS are
already explicitly tagged, anyway.)

Additionally, fix a few minor clang-tidy complaints.

No functional changes.

Depends on D157131

Ray Kraesig [:rkraesig]

Assignee

Comment 6

•

3 years ago

Attached file Bug 1772908 - [3/6] Consistently stop UpdateWatcher thread r?#application-update-reviewers — Details

The UpdateWatcher thread might not be shut down if ProcessUpdates
returned an error. Rework StartStagedUpdate so that the thread will be
shut down after any early return.

More cosmetically, rename the updater thread to "Updater Thread": it's
not limited to being a watcher-thread, but is also used for the initial
update checks.

Depends on D159559

Ray Kraesig [:rkraesig]

Assignee

Comment 7

•

3 years ago

Attached file Bug 1772908 - [4/6] Abjure against casual use of base::ObjectWatcher r?#ipc-reviewers — Details

Add a few relevant caveats to ObjectWatcher documentation, in the hope
of keeping someone else from following it down to a dead end.

Depends on D159560

Ray Kraesig [:rkraesig]

Assignee

Comment 8

•

3 years ago

Attached file Bug 1772908 - [5/6] Implement WinHandleWatcher r?#xpcom-reviewers — Details

Implement a version of ObjectWatcher which does not rely on the Chromium
MessageLoop.

Depends on D159561

Ray Kraesig [:rkraesig]

Assignee

Comment 9

•

3 years ago

Attached file Bug 1772908 - [6/6] Replace polling in UpdateDriver with use of HandleWatcher on Windows r?#application-update-reviewers — Details

While a separate thread is still used to check for the presence of
downloaded updates, there's no need to keep that thread around until the
update completes.

Depends on D159562

Pulsebot

Comment 10

•

3 years ago

Pushed by rkraesig@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7908c5532c62 [1/6] Drive-by cleanup: simplify use of SyncRunnable r=xpcom-reviewers,necko-reviewers,nika,valentin https://hg.mozilla.org/integration/autoland/rev/7247a59febda [2/6] Drive-by cleanup: clean up xpcom/tests/gtest/TestGCPostBarriers.cpp r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/21d98896869f [3/6] Consistently stop UpdateWatcher thread r=application-update-reviewers,bytesized https://hg.mozilla.org/integration/autoland/rev/01a97b1691e9 [4/6] Abjure against casual use of base::ObjectWatcher r=ipc-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/8e25782bd2b9 [5/6] Implement WinHandleWatcher r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/3b84539c5aeb [6/6] Replace polling in UpdateDriver with use of HandleWatcher on Windows r=application-update-reviewers,bytesized

Sandor Molnar[:smolnar]

Comment 11

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7908c5532c62
https://hg.mozilla.org/mozilla-central/rev/7247a59febda
https://hg.mozilla.org/mozilla-central/rev/21d98896869f
https://hg.mozilla.org/mozilla-central/rev/01a97b1691e9
https://hg.mozilla.org/mozilla-central/rev/8e25782bd2b9
https://hg.mozilla.org/mozilla-central/rev/3b84539c5aeb

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox108: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 108 Branch

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

3 years ago

Regressions: 1799741

Atila Butkovits

Comment 12

•

3 years ago

Backed out 2 changesets for causing Bug 1799741.

Backout link: https://hg.mozilla.org/integration/autoland/rev/4d2801e27bb6611eb477d304102e8d1bc1e335a0

Flags: needinfo?(rkraesig)

Atila Butkovits

Updated

•

3 years ago

Status: RESOLVED → REOPENED

status-firefox108: fixed → ---

Resolution: FIXED → ---

Target Milestone: 108 Branch → ---

Cristian Tuns

Comment 13

•

3 years ago

Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/4d2801e27bb6

Ray Kraesig [:rkraesig]

Assignee

Comment 14

•

3 years ago

Issue identified; currently working on a fix and further tests.

Flags: needinfo?(rkraesig)

Pulsebot

Comment 15

•

3 years ago

Pushed by rkraesig@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3b9e920f64f8 [5/6] Implement WinHandleWatcher r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/7aaf55dea6ed [6/6] Replace polling in UpdateDriver with use of HandleWatcher on Windows r=application-update-reviewers,bytesized

Ray Kraesig [:rkraesig]

Assignee

Comment 16

•

3 years ago

Note that this is unlikely to ameliorate the actual issue as much as was originally expected.

Since this was originally identified, the BitsCommander thread has significantly overtaken the UpdateWatcher thread in number of hangs: while previously at roughly 1:1, they are now 20:1 in favor of BitsCommander, and this patchset only modifies UpdateWatcher.

Norisz Fay [:noriszfay]

Comment 17

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3b9e920f64f8
https://hg.mozilla.org/mozilla-central/rev/7aaf55dea6ed

Status: REOPENED → RESOLVED

Closed: 3 years ago → 3 years ago

status-firefox110: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 110 Branch

Jens Stutte [:jstutte]

Comment 18

•

3 years ago

(In reply to Ray Kraesig [:rkraesig] from comment #16)

Note that this is unlikely to ameliorate the actual issue as much as was originally expected.

Since this was originally identified, the BitsCommander thread has significantly overtaken the UpdateWatcher thread in number of hangs: while previously at roughly 1:1, they are now 20:1 in favor of BitsCommander, and this patchset only modifies UpdateWatcher.

I assume that is bug 1741675.