DelayedRunnables will leak if target thread shuts down before delay has expired
Categories
(Core :: XPCOM, defect)
Tracking
()
People
(Reporter: pehrsons, Assigned: pehrsons)
References
Details
Crash Data
Attachments
(8 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Consider delayedRunnable = DelayedRunnable(target, runnable, delay)
. Dispatching it works roughly like this:
- dispatch
delayedRunnable
totarget
, and starts a timertimer
to rundelayedRunnable
ontarget
afterdelay
- DelayedRunnable::Run checks whether delay has passed, if not waits for the timer
timer
- when
timer
fires it dispatchesdelayedRunnable
totarget
from the global TimerThread
Alas, if target
has been shut down by the time the timer fires, the TimerThread releases the timer. In the case of DelayedRunnable it leaks. DelayedRunnable has a strong-ref to the timer, and the timer has a strong-ref to the DelayedRunnable through its callback member.
This seems deliberate; if the callback was released by the timer at this point, it would be released on a different thread than target
, which could lead to a number of issues.
I think it would be reasonable to cancel any pending DelayedRunnable timers as part of target
shutdown, probably as the last thing that happens on target
.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
Assignee | ||
Comment 2•3 years ago
|
||
Assignee | ||
Comment 3•3 years ago
|
||
Assignee | ||
Comment 4•3 years ago
|
||
Because DelayedRunnables are fire-and-forget, there is no way for a targeted
EventTarget to clean them up on shutdown. Thus if a timer fires after
EventTarget shutdown it will fail to dispatch the timer event, and avoid
releasing the timer callback because it's not on the targeted thread. This
causes a leak as there is a ref-cycle between nsTimerImpl::mCallback and
DelayedRunnable::mTimer.
This patch adds nsIDelayedRunnableObserver for a target to observe which
DelayedRunnables are relying on their timer to run them. This allows the target
to schedule a shutdown task to cancel those timers and release the runnables on
the target thread.
Supported DelayedRunnable targets with this patch are TaskQueues,
eventqueue-based nsThreads and XPCOMThreadWrappers that wrap a supported
nsThread.
An assertion makes sure at runtime that future new uses of DelayedRunnable
target nsIDelayedRunnableObserver-supported event targets.
Assignee | ||
Comment 5•3 years ago
|
||
This adds support for nsIDelayedRunnableObserver to nsStreamTransportService.
This is a bit special because nsStreamTransportService uses an nsThreadPool.
Because of race conditions we cannot dispatch a final cleanup task to cancel any
pending DelayedRunnables.
Because of the inherent raciness of threads in the thread pool we assume that
any pending DelayedRunnables can handle being released on any thread. Thus we
dispatch the cleanup task to the background event target once the thread pool
has been shut down and processed all its events. This ensures no races can occur
between the cleanup task and OnDelayedRunnableScheduled.
Assignee | ||
Comment 6•3 years ago
|
||
When mTargetThread is WebSocketImpl it must be released on main since it
implements nsISupportsWeakReference, and clearing weak references is not
threadsafe.
Assignee | ||
Comment 7•3 years ago
|
||
It's unclear whether this bug fixes the leaks that prompted this leak threshold,
or whether they were already fixed. It is however clear that mochitests now run
without leaking.
Assignee | ||
Comment 8•3 years ago
|
||
The patches on this bug make bug 1684441 increase in frequency. Presumable ASAN
shutdown is in many cases already close to the timeout, and this bug is making
it trip over the edge.
A 4 minute timeout made a broad linux x64 ASAN try run go from five occurrences
of bug 1684441 to two, whereas a 5 minute timeout made them go to zero.
Updated•3 years ago
|
Updated•3 years ago
|
Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/ae16f09be41b Break out DelayedRunnable into its own files. r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/de9a7dd7fc79 Add gtests. r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/2103cd9e58b7 Simplify MozPromise code in nsThreadManager::Shutdown. r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/d04f7a7ec375 In xpcom, cancel pending DelayedRunnable timers on shutdown. r=KrisWright https://hg.mozilla.org/integration/autoland/rev/6fdd154aa151 In necko, cancel pending DelayedRunnable timers on shutdown. r=necko-reviewers,dragana https://hg.mozilla.org/integration/autoland/rev/e38e8b90f119 Release WebSocketChannel::mTargetThread on main. r=necko-reviewers,dragana https://hg.mozilla.org/integration/autoland/rev/982a46056fcb Remove mochitest rdd leak threshold. r=mjf,mccr8 https://hg.mozilla.org/integration/autoland/rev/ec0b0fcc8d88 Increase async shutdown timeout for ASAN to 5 minutes. r=xpcom-reviewers,mccr8
Comment 10•3 years ago
|
||
Backed out 8 changesets (Bug 1695580) for causing build bustages on DataMutex.h.
https://hg.mozilla.org/integration/autoland/rev/2d5c17a4324bf88c2e7cf5030dbd7d9be4f34788
Push with failures:
https://treeherder.mozilla.org/jobs?repo=autoland&revision=ec0b0fcc8d88eb688bd1d43d03d35a29875799d1&selectedTaskRun=bAY9ZTMFQd2sF75aBXOuCA.0
Failure log:
https://treeherder.mozilla.org/logviewer?job_id=335566908&repo=autoland&lineNumber=40320
Assignee | ||
Comment 11•3 years ago
|
||
When you think you've covered all your bases.. Thanks for backing out, I'm taking a look.
Comment 12•3 years ago
|
||
Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/19d803d5ab12 Break out DelayedRunnable into its own files. r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/e263a02e58ea Add gtests. r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/2a88ea548a86 Simplify MozPromise code in nsThreadManager::Shutdown. r=xpcom-reviewers,nika https://hg.mozilla.org/integration/autoland/rev/3e501ab00f18 In xpcom, cancel pending DelayedRunnable timers on shutdown. r=KrisWright https://hg.mozilla.org/integration/autoland/rev/29f195804825 In necko, cancel pending DelayedRunnable timers on shutdown. r=necko-reviewers,dragana https://hg.mozilla.org/integration/autoland/rev/741be88ca028 Release WebSocketChannel::mTargetThread on main. r=necko-reviewers,dragana https://hg.mozilla.org/integration/autoland/rev/07e38871c80e Remove mochitest rdd leak threshold. r=mjf,mccr8 https://hg.mozilla.org/integration/autoland/rev/503cd99c7318 Increase async shutdown timeout for ASAN to 5 minutes. r=xpcom-reviewers,mccr8
Comment 13•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/19d803d5ab12
https://hg.mozilla.org/mozilla-central/rev/e263a02e58ea
https://hg.mozilla.org/mozilla-central/rev/2a88ea548a86
https://hg.mozilla.org/mozilla-central/rev/3e501ab00f18
https://hg.mozilla.org/mozilla-central/rev/29f195804825
https://hg.mozilla.org/mozilla-central/rev/741be88ca028
https://hg.mozilla.org/mozilla-central/rev/07e38871c80e
https://hg.mozilla.org/mozilla-central/rev/503cd99c7318
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Kris, is this something we should consider backing out, at least from beta? The patches in the bug this is blocking haven't landed yet, and there are an awful lot of regressions from these patches. It also sounds like fixing the regressions is not easy, and might introduce additional risk. If you want to keep working on the regressions, I could prepare some backout patches and see how they do on try.
Comment 16•3 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #15)
Kris, is this something we should consider backing out, at least from beta? The patches in the bug this is blocking haven't landed yet, and there are an awful lot of regressions from these patches. It also sounds like fixing the regressions is not easy, and might introduce additional risk. If you want to keep working on the regressions, I could prepare some backout patches and see how they do on try.
The fix for the top crasher from my understanding is nontrivial so I think we should back this out from beta for the time being.
Comment 18•3 years ago
|
||
Updated•3 years ago
|
Comment 19•3 years ago
|
||
Backed out of beta again for 90:
https://hg.mozilla.org/releases/mozilla-beta/rev/3dabe43c4982a4759af9195a9eaf02d1062304eb
Comment hidden (Intermittent Failures Robot) |
Description
•