Closed Bug 1695580 Opened 3 years ago Closed 3 years ago

DelayedRunnables will leak if target thread shuts down before delay has expired

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- fixed

People

(Reporter: pehrsons, Assigned: pehrsons)

References

Details

Crash Data

Attachments

(8 files)

Consider delayedRunnable = DelayedRunnable(target, runnable, delay). Dispatching it works roughly like this:

Alas, if target has been shut down by the time the timer fires, the TimerThread releases the timer. In the case of DelayedRunnable it leaks. DelayedRunnable has a strong-ref to the timer, and the timer has a strong-ref to the DelayedRunnable through its callback member.

This seems deliberate; if the callback was released by the timer at this point, it would be released on a different thread than target, which could lead to a number of issues.

I think it would be reasonable to cancel any pending DelayedRunnable timers as part of target shutdown, probably as the last thing that happens on target.

Assignee: nobody → apehrson
Status: NEW → ASSIGNED

Because DelayedRunnables are fire-and-forget, there is no way for a targeted
EventTarget to clean them up on shutdown. Thus if a timer fires after
EventTarget shutdown it will fail to dispatch the timer event, and avoid
releasing the timer callback because it's not on the targeted thread. This
causes a leak as there is a ref-cycle between nsTimerImpl::mCallback and
DelayedRunnable::mTimer.

This patch adds nsIDelayedRunnableObserver for a target to observe which
DelayedRunnables are relying on their timer to run them. This allows the target
to schedule a shutdown task to cancel those timers and release the runnables on
the target thread.

Supported DelayedRunnable targets with this patch are TaskQueues,
eventqueue-based nsThreads and XPCOMThreadWrappers that wrap a supported
nsThread.

An assertion makes sure at runtime that future new uses of DelayedRunnable
target nsIDelayedRunnableObserver-supported event targets.

This adds support for nsIDelayedRunnableObserver to nsStreamTransportService.

This is a bit special because nsStreamTransportService uses an nsThreadPool.
Because of race conditions we cannot dispatch a final cleanup task to cancel any
pending DelayedRunnables.

Because of the inherent raciness of threads in the thread pool we assume that
any pending DelayedRunnables can handle being released on any thread. Thus we
dispatch the cleanup task to the background event target once the thread pool
has been shut down and processed all its events. This ensures no races can occur
between the cleanup task and OnDelayedRunnableScheduled.

When mTargetThread is WebSocketImpl it must be released on main since it
implements nsISupportsWeakReference, and clearing weak references is not
threadsafe.

It's unclear whether this bug fixes the leaks that prompted this leak threshold,
or whether they were already fixed. It is however clear that mochitests now run
without leaking.

The patches on this bug make bug 1684441 increase in frequency. Presumable ASAN
shutdown is in many cases already close to the timeout, and this bug is making
it trip over the edge.

A 4 minute timeout made a broad linux x64 ASAN try run go from five occurrences
of bug 1684441 to two, whereas a 5 minute timeout made them go to zero.

Attachment #9211586 - Attachment description: Bug 1695580 - In xpcom, cancel pending DelayedRunnable timers on shutdown. r?#xpcom-reviewers → Bug 1695580 - In xpcom, cancel pending DelayedRunnable timers on shutdown. r?KrisWright!
Attachment #9211589 - Attachment description: Bug 1695580 - Remove mochitest rdd leak threshold. r?mjf → Bug 1695580 - Remove mochitest rdd leak threshold. r?mjf,r?mccr8
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ae16f09be41b
Break out DelayedRunnable into its own files. r=xpcom-reviewers,nika
https://hg.mozilla.org/integration/autoland/rev/de9a7dd7fc79
Add gtests. r=xpcom-reviewers,nika
https://hg.mozilla.org/integration/autoland/rev/2103cd9e58b7
Simplify MozPromise code in nsThreadManager::Shutdown. r=xpcom-reviewers,nika
https://hg.mozilla.org/integration/autoland/rev/d04f7a7ec375
In xpcom, cancel pending DelayedRunnable timers on shutdown. r=KrisWright
https://hg.mozilla.org/integration/autoland/rev/6fdd154aa151
In necko, cancel pending DelayedRunnable timers on shutdown. r=necko-reviewers,dragana
https://hg.mozilla.org/integration/autoland/rev/e38e8b90f119
Release WebSocketChannel::mTargetThread on main. r=necko-reviewers,dragana
https://hg.mozilla.org/integration/autoland/rev/982a46056fcb
Remove mochitest rdd leak threshold. r=mjf,mccr8
https://hg.mozilla.org/integration/autoland/rev/ec0b0fcc8d88
Increase async shutdown timeout for ASAN to 5 minutes. r=xpcom-reviewers,mccr8

When you think you've covered all your bases.. Thanks for backing out, I'm taking a look.

Flags: needinfo?(apehrson)
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/19d803d5ab12
Break out DelayedRunnable into its own files. r=xpcom-reviewers,nika
https://hg.mozilla.org/integration/autoland/rev/e263a02e58ea
Add gtests. r=xpcom-reviewers,nika
https://hg.mozilla.org/integration/autoland/rev/2a88ea548a86
Simplify MozPromise code in nsThreadManager::Shutdown. r=xpcom-reviewers,nika
https://hg.mozilla.org/integration/autoland/rev/3e501ab00f18
In xpcom, cancel pending DelayedRunnable timers on shutdown. r=KrisWright
https://hg.mozilla.org/integration/autoland/rev/29f195804825
In necko, cancel pending DelayedRunnable timers on shutdown. r=necko-reviewers,dragana
https://hg.mozilla.org/integration/autoland/rev/741be88ca028
Release WebSocketChannel::mTargetThread on main. r=necko-reviewers,dragana
https://hg.mozilla.org/integration/autoland/rev/07e38871c80e
Remove mochitest rdd leak threshold. r=mjf,mccr8
https://hg.mozilla.org/integration/autoland/rev/503cd99c7318
Increase async shutdown timeout for ASAN to 5 minutes. r=xpcom-reviewers,mccr8
Crash Signature: [@ nsAutoOwningThread::AssertCurrentThreadOwnsMe(char const*) const]
Regressions: 1704391
Regressions: 1704887
Regressions: 1703978

Kris, is this something we should consider backing out, at least from beta? The patches in the bug this is blocking haven't landed yet, and there are an awful lot of regressions from these patches. It also sounds like fixing the regressions is not easy, and might introduce additional risk. If you want to keep working on the regressions, I could prepare some backout patches and see how they do on try.

Flags: needinfo?(kwright)

(In reply to Andrew McCreight [:mccr8] from comment #15)

Kris, is this something we should consider backing out, at least from beta? The patches in the bug this is blocking haven't landed yet, and there are an awful lot of regressions from these patches. It also sounds like fixing the regressions is not easy, and might introduce additional risk. If you want to keep working on the regressions, I could prepare some backout patches and see how they do on try.

The fix for the top crasher from my understanding is nontrivial so I think we should back this out from beta for the time being.

Flags: needinfo?(kwright)

Over to pascal for the beta backout.

Flags: needinfo?(pascalc)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: