Closed Bug 1660555 Opened 4 years ago Closed 3 years ago

AbortSignal needs to keep strong references to its followers

Categories

(Core :: DOM: Core & HTML, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: Waldo, Assigned: saschanaz)

References

Details

Attachments

(16 files, 4 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

In bug 1502355 I'm implementing ReadableStream.prototype.pipeTo support, to copy data from a readable stream to a writable stream. The algorithm includes support for an optional AbortSignal that may be used to abort piping. If the signal is externally aborted, source and destination are canceled and aborted -- even if they have been dropped on the floor by the entire rest of the system, even if the entire piping operation is otherwise unreachable, unobservable, and inaccessible. But if AbortSignal currently doesn't keep anything else alive -- and particularly, it doesn't keep its followers alive, including a follower that corresponds to abortAlgorithm from the ReadableStreamPipeTo operation -- those semantics are not achievable.

AbortSignal needs to keep strong references to followers to achieve these semantics. I've only just run across/discovered this, but I'll look into writing a patch for this in short order. (Probably not immediately today, but early next week.)

I might have a patch for this.

It seems to pass ./mach wpt testing/web-platform/tests/fetch/ locally -- or at least it prints "PASS" and finishes with a success code, tho console spew indicated a bunch of crashes in LinkedList that do not seem related to my patching, and I'm not sure why they wouldn't have made the test run fail.

However, the patch is atop a few other things still in flight, so it's not quite ready to be posted. And some of the cycle-collection details in it are...of rather dubious correctness, given how much I had to make guesses as to how to use the various collection macros and cargo-cult from other users. And without bug 1502355 available, it's possible this bug's concern isn't even testable, so I may not end up wholly satisfied with whatever I post here, when I do finally post it fully...

I'm pretty sure I have clean patches for this now (not just a single large patch changing everything at once, except not actually changing everything). I went through the existing AbortFollower subclasses and carefully adjusted them all to be cycle-collected (and thus single-threaded), and the final result passes a compile smoketest. Haven't done a try run yet to say for sure, tho -- maybe in a day or two.

FWIW these patches are only lightly tested, because bug 1618019 means I can't run wpt tests locally, at least not using ./mach wpt.

Bug 1618019 is no longer an issue. Try-run indicates there's a problem likely in the FetchDriver rev:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3615c9ba9aa68fd9981a81faba1129ba04407bc&selectedTaskRun=bhVZRdEOTl6GFjmzX0bnog.0

so these patches aren't 100% ready to go, but they're at least mostly ready. Will work on figuring that out in parallel with any reviewing that might be happening on these, in advance of figuring that out.

Attachment #9179413 - Attachment is obsolete: true

The FetchDriver changes created an un-cycle-collected cycle that was not manually broken via a Shutdown() function (as happened in the BodyConsumer patch), so was no good on its own. I folded it into the make-the-signal-follower-cycle-CC'd patch, and I removed the actually-unnecessary link from FetchDriver to the follower needed to handle out-of-band aborting. With that, these patches seem to be landable:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c4536dc412634ef25e8499dc37e0bd61266c5403

(Note that the subsequent patches atop this add more ReadableStream.prototype.pipeTo implementation, but as that implementation is still hidden behind a preference there's no effect of the changes in shipping builds. Also note that in the patches in bug 1502355 that build atop this, the patches that invoke poking into Gecko are similarly inaccessible in shipping builds, without that pref set. I have not tested any of those changes to see if they work. I'm just hoping to get a review to see if the XPConnect-side changes seem to make sense. Then when it's time to actually flip the preference I can find out whether the code actually works, and if it doesn't, hopefully only very small changes would be needed to make it work.)

Attachment #9179412 - Attachment is obsolete: true
Attachment #9179408 - Attachment is obsolete: true

One more thing popped up, in a try-run that went beyond just linux64 debug. This hunk is only half-good:

diff --git a/dom/fetch/Fetch.h b/dom/fetch/Fetch.h
--- a/dom/fetch/Fetch.h
+++ b/dom/fetch/Fetch.h
@@ -122,8 +122,9 @@ nsresult ExtractByteStreamFromBody(const
 template <class Derived>
 class FetchBody : public BodyStreamHolder, public AbortFollower {
  public:
-  using BodyStreamHolder::AddRef;
-  using BodyStreamHolder::Release;
+  using BodyStreamHolder::QueryInterface;
+
+  NS_INLINE_DECL_REFCOUNTING_INHERITED(FetchBody, BodyStreamHolder)
 
   bool GetBodyUsed(ErrorResult& aRv) const;
 

In debug builds, or in opt builds with refcount logging, it's fine -- it declares addref/release functions in the class, and so classes that derive from FetchBody<Derived> can do stuff like NS_IMPL_ADDREF_INHERITED(EmptyBody, FetchBody<EmptyBody>) and FetchBody<EmptyBody>::AddRef will be unambiguous.

But in builds without refcount logging, we have

/**
 * A macro to declare and implement addref/release for a class that does not
 * need to QI to any interfaces other than the ones its parent class QIs to.
 * This can be a no-op when we're not building with refcount logging, because in
 * that case there's no real reason to have a separate addref/release on this
 * class.
 */
#if defined(NS_BUILD_REFCNT_LOGGING)
#  define NS_INLINE_DECL_REFCOUNTING_INHERITED(Class, Super)  \
    NS_IMETHOD_(MozExternalRefCountType) AddRef() override {  \
      NS_IMPL_ADDREF_INHERITED_GUTS(Class, Super);            \
    }                                                         \
    NS_IMETHOD_(MozExternalRefCountType) Release() override { \
      NS_IMPL_RELEASE_INHERITED_GUTS(Class, Super);           \
    }
#else  // NS_BUILD_REFCNT_LOGGING
#  define NS_INLINE_DECL_REFCOUNTING_INHERITED(Class, Super)
#endif  // NS_BUILD_REFCNT_LOGGINGx

and so in non-refcount-logging builds, there will be no declared AddRef/Release in FetchBody<Derived>, and so FetchBody<EmptyBody>::AddRef will be ambiguous.

I expect this can probably be solved by changing the alternative macro-definition to be using Super::AddRef; using Super::Release;. Probably. But I'm gonna need to test it first, and it's obviously going to have to be a separate patch from all these (that either must land prior to these patches, or will have to be inserted into the sequence before attachment 9179411 [details]).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1674f0f218b0e390dc629e7482a7579062dec1df for a patch-stack that will allow opt builds to succeed -- but without the final patch to turn on cycle collection of all this stuff.

A previous try push indicates at least one lingering issue, that is the reason I excluded the final patch. That particular job has a bunch of stuff leaking, including AbortSignal, FetchBody, FetchDriver, and FetchSignalFollower instances. Several such failures are apparent:

  • OS X 10.14 debug M 2
  • Windows 7 debug M 4
  • Windows 10 x64 debug M 4
  • Windows 10 x64 WebRender debug M 4

Hopefully I can pocket a bunch of these patches' wins, then (with much less patch to examine) I can figure out what's wrong with however many patches can't quite be landed.

Depends on: 1672556
Attachment #9179409 - Attachment description: Bug 1660555 - Split AbortFollower::Abort into AbortFollower::RunAlgorithm and AbortSignalImpl::SignalAbort functions for readability. r=smaug! → Bug 1660555 - Split AbortFollower::Abort into AbortFollower::RunAbortAlgorithm and AbortSignalImpl::SignalAbort functions for readability. r=smaug!
Keywords: leave-open
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/autoland/rev/bd6a0213afbd Split AbortFollower::Abort into AbortFollower::RunAbortAlgorithm and AbortSignalImpl::SignalAbort functions for readability. r=smaug

I'm hitting some leaks with the final patch in the stack:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6db3f2738cda4cf27c950b57ae5ea5d00806ece1

It looks introducing FetchSignalFollower somehow, probably, creates a cycle. Unfortunately none of the Linux jobs reproduce it. And I haven't found a testcase just from inspection yet. Everything just up to that final patch is fine and doesn't introduce problems, so in principle those patches could be landed. But despite my ability to land them (and even tryservering to check whether I could), I am still somewhat reticent to do so in case the partial work somehow unexpectedly untowardly changes lifetimes.

I'm going to do some reordering/rearranging of this patch stack so I can land parts that are alpha-renames and similarly trivial things, that don't alter ownership and so on.

I also have a new patch that does alter some AbortSignal internals that I had hoped might help with this, but it appears not to. Still, the changes seem conceptually good and worth having, and they may prevent some memory leaks or at least holding onto stuff too long. So I'll probably post it here once the trivial things have landed.

Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/autoland/rev/463a8e88c16a Add a |Signal()| accessor to |AbortFollower| rather than directly accessing an inherited, protected member. r=smaug https://hg.mozilla.org/integration/autoland/rev/29e0a26dc84a Make |AbortFollower| actually inherit from |nsISupports|. r=smaug https://hg.mozilla.org/integration/autoland/rev/062f02943785 Move traverse/unlink operations on |AbortFollower| and |AbortSignalImpl| into static member functions. r=smaug https://hg.mozilla.org/integration/autoland/rev/6072c31f132a Add correct-thread assertions to all |AbortSignalProxy| member functions. r=smaug https://hg.mozilla.org/integration/autoland/rev/6de673e861f5 Improve some comments and add a helper function for accessing the main-thread event target. r=smaug https://hg.mozilla.org/integration/autoland/rev/03dbc65807bb Define two member functions in Fetch.cpp out-of-line. r=smaug
Attachment #9182209 - Attachment description: Bug 1660555 - Add a mostly-empty |WorkerSignalFollower| class and member of |AbortSignalProxy|. r=smaug! → Bug 1660555 - Add a mostly-empty |WorkerSignalFollower| class. r=smaug!
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/autoland/rev/f6517a331463 Add a mostly-empty |WorkerSignalFollower| class. r=smaug https://hg.mozilla.org/integration/autoland/rev/ad84811e4a92 Move the runnable into |WorkerSignalFollower|. r=smaug https://hg.mozilla.org/integration/autoland/rev/3b6e3a66c1de Make |WorkerSignalFollower| actually an |AbortFollower| (that for the moment does nothing when invoked). r=smaug
Depends on: 1674495
Pushed by jwalden@mit.edu: https://hg.mozilla.org/integration/autoland/rev/bbd35b96ab6f Make |WorkerSignalFollower| do the following. r=smaug https://hg.mozilla.org/integration/autoland/rev/e5fc911da9c9 |AbortSignalProxy| doesn't need to be |nsISupports|, just threadsafe-ly refcounted. r=smaug

I'm going to land the patches that precede the final switch-over, and do not induce any leaks, to get them off my system. Still haven't worked through just precisely how that final patch creates a cycle yet, or at least not well enough to explain it well and give a reduced testcase.

Backed out 2 changesets (Bug 1660555) for causing crashes in [@ mozilla::dom::AbortFollower::Unfollow] Bug 1692362 a=backout

Backout: https://hg.mozilla.org/mozilla-central/rev/0312654f27b415aa62bb90307371aebc0cd5fd77

Flags: needinfo?(jwalden)

(In reply to Jeff Walden [:Waldo] from comment #34)

I'm going to land the patches that precede the final switch-over, and do not induce any leaks, to get them off my system. Still haven't worked through just precisely how that final patch creates a cycle yet, or at least not well enough to explain it well and give a reduced testcase.

@Waldo: Hi. I know last message was only written one month ago, but I'm curious about whether you have any status update to provide here? Thank you!

Do you know how we are going to proceed with these bugs?

Flags: needinfo?(bugs)
Pushed by evilpies@gmail.com: https://hg.mozilla.org/integration/autoland/rev/86e154074372 Make |WorkerSignalFollower| do the following. r=smaug https://hg.mozilla.org/integration/autoland/rev/4e42fc2c7ff3 |AbortSignalProxy| doesn't need to be |nsISupports|, just threadsafe-ly refcounted. r=smaug https://hg.mozilla.org/integration/autoland/rev/973fbed4566f Null mSignalProxy after calling Shutdown in WorkerFetchResolver. r=smaug

The leak with the two remaining patches can be reproduced with a debug build and running ./mach test dom/cache/test/mochitest/.

QA Whiteboard: qa-not-actionable
Attachment #9211232 - Attachment is obsolete: true

Cleaning up my needinfo requests, we will probably just wait for Matthew's streams work.

Flags: needinfo?(bugs)
Blocks: 1734241, 1734243

This entire thing doesn't have a test as far as I can see, but I guess the streams WPT have some failing tests? Could you point me to them if exist?

Flags: needinfo?(mgaudet)
Flags: needinfo?(jwalden)
Flags: needinfo?(evilpies)

Re-treading this a bit (and Tom, feel free to chme in):

I think once we have pipeTo available (so really, this may not so much block pipeTo, as be blocked by) then the kind of test case we're looking to pass looks something like this:

var wasCancelled = false;

var source = {
    pull(controller) { controller.enqueue("hi"); },
    cancel() { wasCancelled = true; }
}

var readable = new ReadableStream(source);
var writable = new WritableStream();

const controller = new AbortController();
readable.pipeTo(writable, { signal: controller.signal });

// Render the readable and writable streams inaccessible
readable = null;
writable = null;

// Do sufficient GC/CC cycles such that the ReadableStream and WritableStream are cleaned up
// Which they should be right now, as no one has any references
for (i = 0; i < 10; i++) { CC(); gc(); }

controller.abort();

// At this point, we should have executed the cancelled callback, and as such we should see
// its effect.
assert_eq(true, wasCancelled);
Flags: needinfo?(mgaudet)

Thanks! I guess there is no WPT function to trigger GC, so I can't trivially check what other browsers do 🥲

Just to double check: Anne, I don't see DOM explicitly says about this but I guess https://dom.spec.whatwg.org/#abortsignal-add is expected to have strong reference and never affected by GC?

Flags: needinfo?(annevk)

I just found that both Blink and WebKit has function to trigger GC in WPT: https://github.com/web-platform-tests/wpt/blob/b4bc500143c070d8c6e5398d0775446bedea8134/js/builtins/weakrefs/resources/maybe-garbage-collect.js#L14-L30

It's sad that we are the only one who don't have it.

Hi James, per https://github.com/web-platform-tests/wpt/issues/7899 I think you were doing something regarding to the GC thing, has there been some progress?

Flags: needinfo?(james)

Well I wrote a spec: https://testutils.spec.whatwg.org/ I was hoping to write an implementation in Gecko, but it got overtaken by other priorities. As I understand it it shouldn't be that hard for people already familiar with the DOM code (wanting it to also be exposed in workers might make it somewhat harder, but I guess we could accept a Window-only solution for now).

Flags: needinfo?(james)

Any abort algorithms added are not expected to be collected, no. And I think comment 0 talks about https://dom.spec.whatwg.org/#abortsignal-follow where it is expected that parentSignal keeps followingSignal alive due to followingSignal being referenced from an algorithm added to parentSignal. I hope that helps. This is not an area I'm super strong in. (At some point Web IDL should probably define general GC expectations for platform objects.)

Flags: needinfo?(annevk)
Flags: needinfo?(evilpies)
Assignee: jwalden → krosylight
Attachment #9259874 - Attachment description: WIP: Bug 1660555 - Make AbortFollower::mFollowingSignal a WeakPtr → Bug 1660555 - Make AbortFollower::mFollowingSignal a WeakPtr r=smaug
Attachment #9259875 - Attachment description: WIP: Bug 1660555 - Let AbortSignal grab strong references to its followers → Bug 1660555 - Let AbortSignal grab strong references to its followers r=smaug
Pushed by krosylight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/07b9e1dc7c4e Make AbortFollower::mFollowingSignal a WeakPtr r=smaug https://hg.mozilla.org/integration/autoland/rev/fd60a8c6d1e6 Let AbortSignal grab strong references to its followers r=smaug
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/32492 for changes under testing/web-platform/tests
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Upstream PR merged by moz-wptsync-bot

Kagami, did you check if any of the unlanded patches are still potentially useful?

Oh wait, the situation changed after that patch. Now I wonder it can be applied to AbortFollower itself.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: