Open Bug 1543537 Opened 7 months ago Updated 6 months ago

indicator.js ends up running code in torn-down (cycle-collected, unlinked) windows, triggering cache->PreservingWrapper() assertions

Categories

(Core :: XPConnect, defect, P2)

defect

Tracking

()

Tracking Status
firefox68 --- affected

People

(Reporter: bzbarsky, Unassigned)

References

Details

Attachments

(2 obsolete files)

Is still haven't decided which component this should be in, or even which product...

bdahl has been running into fatal assertions on try with MOZ_BROWSER_XHTML=1, due to the following sequence of event:

  1. browser/components/downloads/content/indicator.js runs in some window, calls defineLazyPreferenceGetter for the "browser.download.autohideButton" preference.
  2. This adds a pref listener that implements nsIWeakReference in JS.
  3. The window in question is closed.
  4. The cycle collector runs, sees nothing has any more references to that window (the weak ref is not a ref because it's not keeping the JS object involved alive!) and unlinks the window, the document, etc. The document is still accessible from the window because it's [StoreInSlot] and we don't clear out the slot, for various reasons.
  5. Before GC gets to run and collect the window, the preference changes and the pref observe is called.
  6. The pref observer calls the _placeholder getter defined in indicator.js and that tries to access properties on the document and hits the assertion when it discovers and expando object and no preserved wrapper (unlinking unpreserved).

MOZ_BROWSER_XHTML=1 matters only because XULDocument is not a DOM proxy (no named getter) and hence doesn't hit the proxy expando code at all.

If we cleared out the slot on the window, this code would stop asserting but would throw from the _placeholder getter instead, which may not be great either.

Ideally we would prevent calling from the pref service into this stuff somehow. Failing that, maybe indicator.js can listen for window teardown and remove its pref observer or no-op the callback function?

Late-night braindump which may or may not help:

I'm wondering if we should just move lazy prefs onto a separate jsm-style (or webidl) thing and stop supporting defining lazy pref getters on windows (or objects tied to windows). That and downloads code needs to stop importing the world when nothing is going on.

In the comments on https://phabricator.services.mozilla.com/D15449 I suggested perhaps in addition to AppConstants.jsm we want something like AppState.jsm where we track things like lazy prefs and "did thing X happen" (e.g. have there been any downloads this session) to avoid having to import additional modules that we don't (yet) need if X hasn't happened / some pref hasn't been flipped. I've been distracted with other things but if this sounds like a sane plan to people I can reprio it.

I've seen this conundrum in a few other places, e.g. pockets code that doesn't need to do much except in the comparatively rare case where it's disabled via policy or the about:config pref, which currently has to load Pocket's own jsms just to work that out. Would be cheaper if that type of state was in a central place.

See Also: → 1516525

(In reply to :Gijs (he/him) from comment #1)

I'm wondering if we should just move lazy prefs onto a separate jsm-style (or webidl) thing and stop supporting defining lazy pref getters on windows (or objects tied to windows).

I have plans to replace the current lazy preference helper with a native implementation, but that unfortunately doesn't help with this case, since it uses a change callback (which I'd never intended the lazy preference getter code to support to begin with...). It'll still have to use something approximating the current JS code if it wants the change callback.

I think this is a more particular manifestation of the general problem that weak references to cycle collected objects can cause dead objects to come back alive if we touch them at the wrong time.

It seems like the sanest thing here to do would be to keep track of weak references that participate in garbage cycles, and unlink them when unlinking everything else in those cycles. I feel like there should be a reasonably elegant way to do that, but my knowledge of the cycle collector doesn't run that deep.

When we unlink an XPCWrappedJS we already call ClearWeakReferences() on it, looks like. So we're presumably not unlinking that in this case. Presumably because it's not actually participating in a garbage cycle, just pointing to it? My understanding of how the CC interacts with the GC here has atrophied, from it's already-poor state... In particular, I don't understand how the cycle collector ends up seeing JS-to-native edges and what it does with unlinking after that.

Olli, is there an explanation of that somewhere, as a sidenote?

Flags: needinfo?(bugs)

Right, in this case by participating in a garbage cycle, I mean pointing into it. Essentially, I'm thinking about treating the weak reference to the JS object as a non-black root, and when there are no paths from black roots to the object it points to, and there are paths from that object to another object that we want to unlink as part of a garbage cycle, we'd unlink the weak reference too.

As for how the CC interacts with the GC, it mostly doesn't. It knows how to trace JS edges using their trace hooks, and it knows how to determine if an object one of those edges points to is a cycle collectable native object by the normal means. It only breaks cycles by unlinking the native objects that are part of them, though, and then leaves the GC to cleanup the then unreachable JS objects.

Using snow white objects or unlinked objects isn't that big problem as such, but does hint about issues elsewhere.

We do traverse XPCWrappedJSs during CC, and do trace them as non-black roots during GC.
And unlinking clears weak references.

But if I interpret the issue right, it is that we return early from
https://searchfox.org/mozilla-central/rev/69ace9da347adcc4a33c6fa3d8e074759b91068c/js/xpconnect/src/XPCWrappedJS.cpp#112-116 but then things like
https://searchfox.org/mozilla-central/rev/69ace9da347adcc4a33c6fa3d8e074759b91068c/js/xpconnect/src/XPCWrappedJS.cpp#118-121 keep wrappedJS object alive (it doesn't get unlinked). And unlinking is needed to clear weak references
https://searchfox.org/mozilla-central/rev/69ace9da347adcc4a33c6fa3d8e074759b91068c/js/xpconnect/src/XPCWrappedJS.cpp#487,506

Is there some easy way to reproduce this locally?

(Why proxy code doesn't recreate the (preserved) wrapper?)

Flags: needinfo?(bugs)
Blocks: 1533881

(In reply to Olli Pettay [:smaug] from comment #6)

Is there some easy way to reproduce this locally?

I've only been able to reproduce on try (browser-chrome) and I think the same for bz.

ok, which particular test or on which platforms?

Component: General → XPConnect

The test varies and it happens on all platforms. It pretty consistently happens on bc13 browser/base/content/test/general/browser_documentnavigation.js on linux 64 debug. You'll need to set MOZ_BROWSER_XHTML=1.

Example try run with a number of failures:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=022d846cd862ebdfc30fd9033e2290173a893cbf

(In reply to Olli Pettay [:smaug] from comment #6)

Using snow white objects or unlinked objects isn't that big problem as such, but does hint about issues elsewhere.

That may be true in principal, but in practice touching an object after it's been unlinked (or its 0 refcount hook has fired) causes problems. Sometimes quite subtle problems...

I'm not sure if this is related or not, but when I run this code in the browser console:

{
  const PREF = "foo.pref";
  Services.prefs.clearUserPref(PREF);
  let obj = {
    QueryInterface: ChromeUtils.generateQI(["nsISupportsWeakReference"]),
    variant: Cc["@mozilla.org/variant;1"].createInstance(Ci.nsIWritableVariant),
    observe() {
      console.log("Hello.", this.variant, this.variant && this.variant.QueryInterface(Ci.nsIVariant));
    },
  };
  obj.variant.setAsISupports(obj);
  Services.prefs.addObserver(PREF, obj, true);
  obj = null;
  Cu.forceCC();
  Services.prefs.setBoolPref(PREF, true);
}

the observer object never gets cleaned up (and the variant never gets unlinked), no matter how what combinations of forceGC() and forceCC() I do. If I leave out the cyclic reference through the variant (which should support cycle collection), the observer goes away after GC.

That suggests that we're either not traversing or not unlinking this reference cycle correctly, but I don't know whether or not it has anything to do with the weak reference.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #4)

Presumably because it's not actually participating in a garbage cycle, just pointing to it?

On further thought, this actually must be participating in a garbage cycle. This bug happens when the CC unlinks the window. The preference observer is being kept alive by the window global's expando object, it has a path back to the window via (at least) its closure over the document object, and it's holding alive its XPCWrappedJS, which has a cyclic reference back to the JS object, so as far as the CC is concerned it has to be part of the same garbage cycle, and should be unlinked.

There's certainly an argument for just calling ClearWeakReferences() on all nsISupports cycle collected objects that QI to nsISupportsWeakReference. Though I see now that it isn't actually a method on that interface so we couldn't do it that easily.

and it knows how to determine if an object one of those edges points to is a cycle collectable native object by the normal means.

Right, this is the part that I could not figure out where this is happening, if anywhere...

But if I interpret the issue right, it is that we return early from

Ah, because the weakptr thing is not really holding a ref to us, so the only ref is our self-ref while the JS object is alive, and we take that early return?

keep wrappedJS object alive (it doesn't get unlinked)

Meaning that us not running that code keeps it alive during the CC, in the sense that it has a ref that the CC doesn't know about so the CC considers it live?

Yes, this seems right.

(Why proxy code doesn't recreate the (preserved) wrapper?)

Recreate in what sense? One failure mode here is this sequence of events:

  1. Things get unlinked, wrapper (reflector) that has expandos gets unpreserved.
  2. The object is brought back to life.
  3. Someone reads an expando property from it.
  4. The someone hands the object over to C++, nothing is holding on to the reflector anymore.
  5. The reflector is GCed.
  6. The "someone" grabs the object from C++ again (with new reflector), the expando is gone.

For non-proxy objects (for which we don't even have an assert right now), where would we preserve the wrapper in all this? I don't see a place to do it.

For proxies, we could re-preserve the wrapper during step 3, if we discover we have an expando object but are not preserved. But even that would fail if step 3 happens from an IC (see GetPropIRGenerator::tryAttachDOMProxyExpando) because that basically reimplements the relevant codepath bit in jitcode except obviously without the assert in question and without the ability to preserve wrappers.

and it's holding alive its XPCWrappedJS, which has a cyclic reference back to the JS object

The JS object does not have an edge to the XPCWrappedJS from the CC's point of view. The XPCWrappedJS has a self-edge, but it (and the edge from the XPCWrappedJS to the JS object and in fact all of the XPCWrappedJS outgoing edges) are treated specially when the XPCWrappedJS refcount is 1; that's the first link in comment 6. So in fact if the XPCWrappedJS refcount is 1 it does not end up in a cycle because it has no outgoing edges as far as the CC is concerned. Then if the JS object is GCed, we go ahead and kill the XPCWrappedJS in JSObject2WrappedJSMap::UpdateWeakPointersAfterGC.

See Also: → 1545441

FWIW I tried applying Andrew's patch in Bug 1535403 to browser.xhtml and still see the same assertions: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1cbc93e0798574b5d4e26f43e3053036833bc3af&selectedJob=241244828. Not sure if it was expected to fix it but figured I'd try.

Yeah, I would not expect that patch to help here...

What would help is if we could figure out a way to represent the conceptual ref from the JSObject to its XPCWrappedJS that's dealt with by the XPCWrappedJS self-ref to the cycle collector. Then the CC would know that the XPCWrappedJS is in a cycle and would actually unlink it while unlinking that cycle.

This is why I was asking how the CC currently detects/represents JS-to-C++ edges.

Yeah, I split out the weak ref stuff into a different patch, which I can upload tomorrow. I'm not sure if the weak ref part of those patches actually was helping the problem I was trying to fix, so I left it out for now. And they don't seem to fix all of the weak ref to unlinked inner window issues I was seeing.

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #15)

This is why I was asking how the CC currently detects/represents JS-to-C++ edges.

JS objects are traversed by the CC with CycleCollectedJSRuntime::TraverseGCThing().

The way things work now is that we never unlink JS to C++ edges in the CC. Instead, we wait for the GC to run and finalize the JS object, which then releases the C++ object.

Here's my WIP patch that drops a few instances of weak references to inner and outer windows. I don't know if it will help this.

It wouldn't, afaict. The weak ref in this bug is to some random object that then entrains the window indirectly.

I was poking at my suggestion in comment 15 earlier today, and discovered a few things:

  1. The same JSObject can actually correspond to multiple nsXPCWrappedJS instances (at most one "root" nsXPCWrappedJS and an arbitrary number of non-"root" ones) and hence conceptually has many outgoing edges to C++ things. These edges can coexist with the edges CycleCollectedJSRuntime::NoteGCThingXPCOMChildren already knows about, since any JSObject can end up as an XPCWrappedJS, afaict.

  2. At the moment we don't have a JSObject-to-nsXPCWrappedJS map for anything other than "root" nsXPCWrappedJS values. So to implement comment 15 we'd need to create such a map, and due to point #1 it would need to be one-to-many.

  3. All the existing knowledge about nsXPCWrappedJS is on XPCJSRuntime, of course, so getting there from NoteGCThingXPCOMChildren would involve a virtual function call or something. But on the other hand, that knowledge is not helpful anyway due to point #2. So maybe we can just add a new map of some sort on CycleCollectedJSRuntime.

Temporary workaround for preserving wrapper assertions in torn-down
window.

Attachment #9059899 - Attachment is obsolete: true

The priority flag is not set for this bug.
:neha, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(nkochar)
Assignee: nobody → bdahl
Status: NEW → ASSIGNED
Flags: needinfo?(nkochar)
Priority: -- → P2
See Also: → 1548026
Assignee: bdahl → nobody
Status: ASSIGNED → NEW

Comment on attachment 9061538 [details]
Bug 1543537 - Remove download button's lazy pref getter on unload. r=bzbarsky

Revision D29294 was moved to bug 1548026. Setting attachment 9061538 [details] to obsolete.

Attachment #9061538 - Attachment is obsolete: true

As per discussion in #content this doesn't need to block release of browser.xhtml, since any behavior this condition is leading to is already present with XUL documents (we just don't assert for it). So as long as we are green post Bug 1548026 this can be removed from the blocker list.

No longer blocks: 1533881
See Also: → 1533881
You need to log in before you can comment on or make changes to this bug.